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SECTION  1 


INTRODUCTION 


This  project  seeks  to  apply  recently  developed  parallel  Kalman  filter 
(PKF)  methods  to  actual  flight  test  data  obtainable  from  White  Sands  Missile 
Range  (WSMR) .  In  this  Phase  I  feasibility  study,  we  show  that  the  PKF 
methods  offer  excellent  speed  up,  reliable  convergence  characteristics  and 
good  accuracy  compared  with  the  standard  Kalman  filter  (SKF) .  Parallel 
computing  architectures  are  presented  that  enable  the  PKF  methods  to  be 
implemented  at  15  KHz  sample  rates  at  64-bit  floating-point  precision. 

This  project  is  innovative  in  that  the  PKF  architectures  utilize  both 
horizontal  and  vertical  parallelism.  A  balance  of  nonlinear  (e.g.,  trigono¬ 
metric  functions,  exponentials,  squares,  square-roots)  and  linear  (e.g.,  add, 
subtract,  multiply,  divide)  computing  resources  rated  at  nearly  25  double¬ 
precision  MFLOPs  (million  floating-point  operations  per  second)  each  is 
recommended.  Coupling  this  with  an  industry-standard,  Vme  bus  chassis 
provides  an  open  architecture  to  permit  other  WSMR  contractors  to  add  to  the 
sys  tem. 

1 . 1  BACKGROUND 

To  illustrate  the  need  to  develop  Kalman  filter  parallel  processing 

architectures,  the  total  number  of  arithmetic  operations  that  must  be 

computed  in  the  Kalman  filter  algorithm  can  be  counted  and  multiplied  by 

different  multiplier  and  adder  speeds.  For  the  Kalman  filter  algorithm 

given  in  Table  1-1,  the  total  number  of  multiplications,  additions  and 

2  2 

divisions  are  given  by:  (n  +  2n  -  1)  additions,  (2n  +  4n  +  1)  multi¬ 

plications,  and  1  division.  Therefore,  the  overall  execution  time  needed 
to  update  the  Kalman  filter  algorithm  in  Table  1-1  is: 


(n^  +  2n  -  1)  t  +  (2n^  +  4n  +  1)  t  +  t  , 
a  m  d 


(l.D 
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TABLE  1-1: 

STANDARD  KALMAN  FILTER  ALGORITHM* 

Kalman* 

Gain 

\  *  v*)hMv+,hI  *  ir‘ 

Filter 
l1  pda  te 

V+)  =  +  Vyk  -  Vk{*)) 

■  (I  -  W  V->  +  Vk 

Covariance 

Update 

V+>  •  (I  -  W  pk<-> 

Measurement 

Update 

yk+l  *  Hk+1  xk(+) 

*  Note  that  when  scalar  measurements  are  processed,  the  Inverse 
operation  reduces  to  a  division  operation. 


where  tg  is  the  addition  time,  t^  is  the  multiplication  time,  and  td 

is  the  division  time.  For  example,  if  t  «  200  nsec,  t  -  200  nsec  and 

a  ’  m 

td  “  10  usee,  one  pass  through  the  Kalman  filter  with  n  «  9  states 
theoretically  requires  only  69  usee.  This  corresponds  to  1/69  u-.ec  *  14.5 
KHz  update  rate  using  state-of-the-art  32-bit  floating-point  VLSI  chips. 
Since  most  systems  deliver  about  207.  of  theoretical  peak  performance,  to 
keep  up  with  real-time  requirements  the  theoretical  speed  needs  to  be  1/20'. 
■  5x  faster  than  real  time.  Thus,  the  Kalman  filter  equations  must  be 
computed  theoretically  at  5x15  KHz  *  75  KHz  rate  to  deliver  real-time 
performance  at  15  KHz.  Since  it  is  well  known  that  the  Kalman  filtering 
must  be  performed  using  floating-point  arithmetic,  the  only  viable  method 
to  increase  the  throughput  of  the  Kalman  filter  by  say  5  to  10  is  with 
parallel  processing.  Optical  processing  is  fast  but  optical  fixed-point 
can  cause  stability  problems  with  the  Kalman  filter.  Nonlinear  or  extended 
Kalman  filtering  is  even  more  computationally  demanding.  To  perform  non¬ 
linear  filtering  in  real  time  at  15  KHz,  a  10  to  100  speed  up  is  needed. 

Note  that  although  much  progess  has  been  made  in  floating-point  adders 
and  multipliers,  the  real  problem  is  fast  hardware  divide  (matrix  inversion) 
is  needed  in  the  Kalman  filter.  Performing  division  in  parallel  on  vector/ 
matrix  elements  is,  therefore,  required  to  speed  up  Kalman  filter 
computations. 
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Nonlinear  Extended  Two/Four-Processor  Parallel  Kalman  Filter 


The  nonlinear  EKF  and  two-  and  four-processor  EPKF  have  been  coded  for 
a  realistic  example  Involving  gravity,  drag  and  measurement  noise.  The 
two-processor  and  four-processor  EPKF  estimates  agree  quite  well  with  the 
original  EKF  estimates.  The  EPKF  utilized  a  parallel  trapezoidal  rule  for 
integrating  the  state  and  covariance  updates  in  the  EPKF. 

Two/Four-Processor  Parallel  Trapezoidal  Rule 

The  two-  and  four-processor  parallel  trapezoidal  rule,  ode  solver 
generally  provided  the  same  solution  in  a  given  number  of  iterations 
compared  with  the  sequential  trapezoidal-rule  ode  solver.  Because  each 
iteration  of  the  parallel  trapezoidal  rule  can  run  on  two  or  four 
processors  simultaneously,  the  time  to  solution  was  reduced  almost 
linearly  with  the  number  of  processors. 

The  floating-point  primitives  and  macros  for  the  PKF  have  been  coded  on 
the  481.  Specifically,  quad  adds  (i.e.,  four  additions),  multiplies, 
subtracts,  divides,  etc.,  have  been  written  and  put  into  PROMs  on  the  481 
cards.  The  linking  of  these  primitives  into  macros  for  say  the  Kalman 
gain  update  is  feasible. 

Accuracy,  Speed  and  Implementation  Considerations 

The  accuracy  of  the  parallel  Kalman  filter  degrades  somewhat  as  more 
processors  are  utilized.  It  is  recommended  that  no  more  than  four  to 
eight  processors  be  used  per  target  state  but  distribute  the  computations 
over  each  state  variable.  Specifically,  a  nine-state  target  model  may 
use  36  processors  (four  per  state  variable)  with  excellent  accuracy  and 
speed  up. 

To  meet  the  WSMR  target  data  processing  requirements,  25  double-preci¬ 
sion  MFLOPs  speed  are  needed  on  linear  (and  nonlinear)  computations. 

Thus,  a  balanced  system  architecture  capable  of  50  MFLOPs  in  total  is 
recommended.  A  20  slot  Vme  bus  based  tracking  system  based  on  a  Motorola 
68030  master,  twelve  Systolic-482  cards  (using  Motorola  68882),  one  array 
processor  based  on  the  new  TI  SN74ACT8847  and  a  SCSI  disk/streaming  tape 
subsystem  is  recommended. 


Benefits  Assessment 


Our  Phase  I  study  determined  that  without  parallel  processing  it  is 
not  possible  to  process  radar  data  in  real  time  with  a  Kalman  filter  based 
on  today's  data  sample  rates.  Hence,  the  major  benefit  of  parallel  pro¬ 
cessing  (more  specifically  parallel  Kalman  filtering)  is  that  it  enables 
real-time  radar  data  processing  that  could  not  be  performed  otherwise. 

This  translates  to  quick-look  and  improved  down  range  safety  during  missile 
flight  testing. 

More  specifically,  the  major  benefits  of  the  proposed  Phase  I  and  11 
research  include: 

o  A  target  tracking  test  facility  that  can  be  used  in  the  lab 
or  in  the  field  for  "quick-look"  analysis  of  flight  data 
improving  safety  and  flight  data  quality 

o  A  parallel  processing  test  facility  that  provides  advanced 
state-of-the-art  computing  resources  to  solve  WSMR  target 
tracking  problems  that  could  not  be  solved  otherwise. 

o  An  industry-standard  parallel  computing  environment  that 
can  enable  the  validation  of  new  parallel  Kalman  filter 
algorithms  and  architectures  as  they  become  available. 


1.1  TECHNICAL  OBJECTIVES 


The  major  objectives  of  the  proposed  Phase  I  &  II  research  include: 

o  Ability  to  track  multiple  targets  at  sample  rates  approaching 
15  KHz  for  real-time  applications 

o  Realize  between  two  and  three  orders  of  magnitude  improvement 
(100  to  lOOOx)  in  computational  speed  over  existing  Kalman- 
filter-based  tracking  systems 

o  Design  and  optimize  application-specific,  parallel  processing 
hardware  for  real-time  target  tracking 

o  Fabricate,  test  and  deliver  a  parallel  computing  system  that 
is  well  suited  for  implementing  parallel  Kalman  filter 
tracking  algorithms,  FFT  computations  and  target  motion 
resolution  algorithms 

o  Validate  the  proposed  hardware  and  software  by  thoroughly 
exercising  the  system  with  actual  missile  flight  test  data 
available  from  WSMR 

o  Deliver  the  parallel  computing  test  facility  and  demonstrate 
the  accuracy  and  speed  benefits  of  the  parallel  Kalman  filter 
methods  by  solving  p  realistic  target  tracking  application  of 
strategic  interest  to  WSMR 


1.4  SUMMARY 

The  remainder  of  this  report  is  divided  into  four  sections.  Section  2 
provides  an  overview  of  our  technical  approach  including  math  modeling. and 
the  parallel  Kalman  filter  algorithms.  Section  3  derives  parallel  computing 
architectures  which  are  well  suited  for  fast  implementation  of  the  PKF 
methods.  Section  4  illustrates  the  performance  of  the  PKF  methods  by  solving 
two  realistic  test  cases  with  the  PKF  methods.  Conclusions  and  recommenda¬ 
tions  are  given  in  Section  3. 
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SECTION  2 


TECHNICAL  APPROACH 


2.1  OVERVIEW  OF  PHASE  I  &  II  TECHNICAL  APPROACH 

Our  technical  approach  is  shown  In  Figure  2-1.  Our  approach  is  based  on 
(1)  our  in-depth  understanding  of  the  computat ional  requirements  associated 
with  Kalman  filtering  and  (2)  two  meetings  with  Foo  Lam,  Elwin  Nunn  and  Bob 
Green  to  discuss  WSMR  missile  data  processing  requirements.  Our  technical 
approach  involves  the  design  of  new  hardware  as  well  as  system  integration  of 
off-the-shelf  components. 

64-Bit  Array  Processor  Design 

The  new  design  activity  focusses  mainly  on  next  generation  array 
processor  technology  for  64-bit  floating-point  FFTs  (useful  in  precise  TMR) , 
ill-conditioned  linear  equation  solvers  and  other  linear  algebra  (matrix/ 
vector  operations)  needed  by  the  parallel  Kalman  filter.  Our  plan  is  to 
survey  newly  announced  floating-point  processors,  fast  memory  and  address 
generators  from  several  vendors  prior  to  design  in.  Since  the  project  is 
three  years  in  duration,  the  technology  selected  must  provide  significant 
performance  gains  to  be  state-of-the-art  three  years  from  now. 

Off-the-Shelf  Components 

As  indicated  later  in  this  report,  twelve  Systolic-481  parallel 
numeric  processors  are  well  matched  for  the  WSMR  target  tracker.  To 
ensure  maximum  performance  the  16  MHz  68881s  in  the  Systolic-481  shall  be 
upgraded  to  25  MHz  68882s  to  increase  performance  by  nearly  lOx  (an  order 
of  magnitude)  per  481  board.  With  twelve  Systolic-481  cards  in  the  test¬ 
bed,  this  upgrade  represents  12  x  10  *  120x  improvement  in  computational 
performance  (i.e.,  over  two  orders  of  magnitude  over  a  single  16  MHz 
Motorola  68020/68881  32-bit  microprocessor  pair.  These  off-the-shelf 
components  shall  be  purchased  along  with  fast  memory  cards,  host  inter¬ 
face,  Vme  bus  chassis,  etc.  Our  strategy  shall  be  to  delay  the  purchase 
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Figure  2-1:  Phase  II  Technical  Approach 


of  components  as  long  as  possible  (perhaps  until  the  end  of  the  second 
year)  to  reduce  cost  and  provide  maximum  performance  within  the  budget. 

Hence,  much  of  the  first  year  effort  will  be  on  the  64-bit  array  processor 
design  and  fitting  math  models  to  the  WSMR  test  data. 

Math  Modeling 

Parallel  algorithms,  such  as  the  parallel  maximum  likelihood  method, 
shall  be  investigated  for  fitting  math  models  to  the  WSMR  test  data.  Like 
the  PKF  methods  utilized  in  Phase  1,  parallel  trapezoidal  rule  can  be 
incorporated  into  the  parallel  maximum  likelihood  method  for  missile  model 
parameter  identification.  Code  development  for  this  activity  can  be 
initially  done  on  a  Systolic-481-based  parallel  computer.  Later  in  the 
contract  the  68881  chips  can  be  replaced  with  fast  25  MHz  68882  chips 
prior  to  delivery. 

Parallel  Kalman  Filtering 

The  parallel  Kalman  filter  methods  shall  be  employed  to  process  actual 
missile  flight  test  data  under  Phase  II.  The  two-  and  four-processor  parallel 
Kalman  filter  methods  shall  be  used  to  estimate  each  missile  state  variable 
simultaneously  in  parallel.  If  a  nine-state  model  is  utilized,  9x4  =  36 
processors  shall  be  employed  in  the  parallel  Kalman  filter. 

The  speed  of  the  parallel  Kalman  filter  formulation  and  parallel 
computing  testbed  shall  be  compared  to  a  single  M68020/68881  in  a  SUN  3/260 
workstation,  Micro  VAX  II  and  Intel  386/387  in  an  IBM  PS/2  Model  80. 


System  Validation  and  Performance  Enhancements 


The  integrity  of  the  WSMR  math  model,  the  parallel  Kalman  filter 
algorithms,  the  parallel  architectures,  and  the  parallel  computing  testbed 
shall  all  be  validated  under  the  Phase  II  work  plan.  This  can  be  accom¬ 
plished  by  comparing  the  numerical  results  of  the  parallel  formulation 
with  the  numerical  results  of  the  standard  Kalman  filter  on  a  sequential 
computer.  In  the  limit  as  the  step  size  approaches  zero,  the  parallel  and 
sequential  methods  should  give  the  same  results.  This  shall  be  confirmed 
under  Phase  II. 


2.2  MATH  MODEL  DEVELOPMENT  FROM  ACTUAL  FLIGHT  TEST  DATA 


2.2.1  Analysis  of  the  Flight  Data 

Our  objective  is  to  determine  a  math  model  and  its  associated 
parameters  from  flgiht  test  data.  The  flight  data  provided  by  White 
Sands  Missile  Range  (WSMR)  was  unclassified.  The  flight  data  covered  a 
ten  minute  mission  of  six  missiles  launched  in  December  1987.  Although 
over  three  Mbytes  of  data  was  provided,  it  is  likely  that  only  a  small 
portion  of  the  data  is  required  for  modeling  purposes. 

Detrending  the  Data 

The  mean  (i.e.,  average)  and  covariance  associated  with  the  flight 
data  shall  be  computed.  The  data  can  then  be  detrended  by  subtracting 
the  mean  from  the  data.  The  resulting  data,  although  still  noisy,  can 
then  be  used  for  modeling. 

Modeling  Considerations 

The  data  shall  be  divided  into  two  parts  -  a  training  set  and  test 
set.  The  training  set  data  can  be  selected  by  computing  the  information 
matrix  (inverse  covariance  matrix)  along  the  data.  The  objective  is  to 
find  a  data  set  that  has  maximum  information  (i.e.,  when  the  norm  of  the 
information  matrix  is  maximum) .  The  max  norm  data  set  is  then  used  to 
compute  a  math  model  of  the  data.  The  test  set  (all  data  other  than  the 
training  set)  is  used  to  test  the  "goodness  of  fit"  of  the  model  to  the 
training  set. 

2.2.2  Math  Modeling  With  Neural  Networks 

Neural-network-based  modeling  algorithms  are  self-organizing  (or 
adaptive  learning)  methods,  in  that  the  "best"  model  is  constructed  from 
all  possible  combinations  of  sensor  measurement  data.  Thus,  no  data  event 
is  overlooked.  This  feature  of  neural-network-type  modeling  algorithms 
along  with  the  fact  that  all  math  models  can  be  generated  simultaneously 
(in  parallel)  makes  neural -network- type  algorithms  particularly  well 
suited  for  real-time  missile  modeling  applications. 


TX.T*.1 
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To  illustrate  how  a  parallel  neural-network-type  algorithm  is  used 
for  math  modeling,  consider  the  following  example: 


In  the  parallel  neural  net  algorithm,  let 


yA  -  aQ  +  aJx1  +  a^  +  noise 


1,  2 ,  « •  * ,  n 


(2.1) 


where  y^  are  dependent  sensor  measurement  values  which  depend 
on  the  independent  state  values  x^  and  x2  . 


The  objective  is  to  find  the  "best"  estimate  of  a^,  and  a 2  in  a 


least  square  sense.  To  estimate  a^, 
equations  can  be  solved  as  follows: 

a2 

and 

a3  * 

’  N 
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(2.2) 


Note  that  the  terms  in  the  normal  equations  can  be  solved  rapidly  using  an 
array  processor  to  compute  the  sum-of-products  and  sum-of-squares  operations. 
In  addition,  due  to  the  independence  of  the  data,  multiple  array  processors 
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can  be  run  in  parallel  to  construct  all  possible  model  forms  (see  Figure  5-1).  ^ 

The  resulting  model  forms  can  be  compared  and  the  "best"  model  selected  based  (V« 


-  a2dL  +  cxd2 

(2.7 

;  “  c3d3 

(2.8 

iven  the  least  square  values  of  A... 

-A\±. 

— 2J- 
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y  can  be  esti 

mated  from  the  oriEinal  independent  variables  x,  .  x„.  and  x_.  Note  that 


olvnomials  is  ideal  for  constructing  math  models  in  terms  of  known 


hysical  variables.  Also,  note  that  the  least  squares  curve  fit  can  be 


erformed  simultaneously  with  multiple  array  processors.  The  availabilit; 


of  high  performance  board-level  array  processors  makes  large  neural-net- 


based  estimators  feasible. 

Note  that  the  neural-network-based  modeline  algorithm  is  not  limited 


to  linear  polynomials  and  in  fact  can  be  generalized  for  quadratic  and  cubic 


olvnomials  among  others.  Thus,  the  neural-net  method  can  be  used  to  con 


struct  nonlinear  models  from  test  data.  These  models,  however,  mav  not  be 


related  back  to  the  original  measurement  variables  as  easily  and.  as  such 


may  have  less  physical  meanin 
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Figure  2-2:  Neural  Network  for  Least  Squares  Estimation  of  Model  Parameters 

The  information  flow  in  the  generalized  neural  net  modeling  approach 
is  illustrated  in  Figure  2-3. 
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Figure  2-3:  Generalized  Neural  Network  for  Model  Determination 

2.2.3  Real-Time  Estimation  of  the  Target  Measurement  Matrix  Using  Neural 
Nets 

Now  that  the  neural  network  technology  has  been  discussed,  we  shall 
see  how  it  can  be  applied  to  real-time  estimation  of  the  target  measure¬ 
ment  matrix,  H.  A  bank  of  neural  nets  are  employed  to  estimate  several 
candidate  values  of  H  (the  target  measurement  matrix)  in  Figure  2-4. 
Each  neural  net  is  given  a  precomputed  trajectory  corresponding  to  a 
candidate  target  type  and  measurement  values  of  the  actual  target. 


I 


Each  neural  net  computes  a  least  squares  estimate  for  H. 


a 
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For  example,  if  H  is  a  2x1  matrix,  then  the  elements  of  the  H 
matrix  can  be  found  by  solving  the  following  set  of  "normal"  equations: 


£  *n,  ” 


lk  *2k  hll 


£  “lk  *2k  £  X2k' 


x  y 
lk  7lk 


x  v 

k  2k  'ik. 


(2.9) 


Then  the  least  squares  estimate  of  H  for  the  i  neural  net  is  given  by: 


^  -  (  hn  h12  ) 


(2.10) 


This  process  is  performed  simultaneously  (in  parallel)  by  each  neural 
net.  The  innovations  for  each  neural  network  are: 


2k  Hi  ”  yk  “  Hi  xk 


(2.11) 


Taking  the  norm  of  each  innovation  e^^  ■  provides  a  measure 

of  the  size  of  the  accumulated  estimation  error  for  each  neural  net.  The 
network  with  the  smallest  error,  e  ^  *  min  ei»  provides  the  best  esti- 

*  *  * 

mate  of  ■  H  .  Once  is  known,  the  class  of  target  is  classified 

immediately  since  it  corresponds  to  one  of  the  precomputed  candidate 

i  * 

target  trajectories  Now  that  H  is  known,  the  actual  target 

trajectory  can  be  determined  by  solving  the  following  system  of  linear 
equations  at  every  time  step  k. 
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Thus,  the  neural  net  learns  H  on-line  and  we  use  H  to  estimate  the 


target  trajectory,  x^,  for  the  missile. 
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2. 2. A  Parallel  Maximum  Likelihood  Parameter  Identification  for  Nonlinear 


Missile  Math  Models 

The  least  square  parameter  estimates  obtained  with  the  parallel  neural- 
net  method  are  biased  estimates  of  the  true  missile  parameters.  To  remove 
the  bias  and  ensure  that  the  parameters  are  consistent  over  the  full  dynamic 
range  of  the  missile  model,  the  parallel  maximum  likelihood  (PML)  method 
and/or  the  parallel  maximum  a  priori  (PMAP)  method  can  be  used.  The  refine¬ 
ment  of  the  nonlinear  model  parameters  can  be  handled  directly  or  cast  as  a 
nonlinear  two-point  boundary  value  problem  (NTPBVP) .  The  NTPBVP  can  be 
solved  using  the  parallel  shooting  method.  In  addition,  any  unknown  states 
of  the  missile  model  can  be  reconstructed  simultaneously  with  the  aerodyna¬ 
mic  parameters  using  the  PML  and/or  PMAP  method.  In  the  remainder  of  this 
section,  the  parallel  maximum  likelihood  (PML)  and  parallel  maximum  a 
posteriori  (PMAP)  methods  are  described  in  detail. 

Parallel  Maximum  Likelihood  Method 


The  maximum  likelihood  method  of  parameter  identification  has  been 
proven  to  be  highly  successful  over  the  years  in  aerospace  application. 
Parallel  processing  technology  can  be  embedded  into  the  standard  maximum 
likelihood  method  to  speed  up  computations.  In  particular,  the  parallel 
maximum  likelihood  (PML)  method  uses  (1)  parallel  numerical  methods  for 
integrating  the  missile's  equations  of  motion,  (2)  a  parallel  Kalman  filter 
to  construct  the  innovations  and  (3)  parallel  optimization  methods  to  select 
the  best  parameter  values  which  minimize  the  ML  performance  index.  Mathe¬ 
matically,  the  maximum  likelihood  method  for  missile  parameter  identifica¬ 
tion  can  be  stated  as  follows: 

Consider  the  following  nonlinear  missile  state  and  measurement 


model : 


x(t)  -  f (x(t)  ,6  (t)  ,t)  +  0w(t) 


z(t)  -  h(z(t) ,6 (t) ,t)  +  v ( t ) 


(2.13) 


(2.14) 


where  x(t)  is  an  n-dimensional  state  vector,  0(t)  is  an  r-dimen- 
sional  vector  of  unknown  parameters,  w(t)  is  a  p-dimensional  process 
noise,  z(t)  is  an  m-dimensional  measurement  vector  which  has  been 
corrupted  by  the  measurement  noise,  v(t). 


The  maximum  likelihood  parameter  estimates  are  obtained  by  minimizing 
the  following  negative  log  likelihood  function: 


J  ( 6)  =  ^  {  v  B  v  +  log  |  B  |  | 


(2.15) 


where 


z  -  h  [x(t) ,  8(t) ,  t] 


(2.16) 


1 

B  ■  E[vv  ]  is  the  covariance  of  the  innovations  v  . 

The  innovations  (v)  and  the  covariance  of  the  innovations  (B)  are 
obtained  from  an  extended  Kalman  filter.  The  Kalman  filter  equations  can 
be  reformulated  for  parallel  processing  and  the  parameter  estimates 
can  then  be  updated  using  the  following  rule: 


ek+l  ®k 


(2.17) 


where  is  a  scalar  stepsize  parameter  chosen  to  ensure  that 

J(6^+^)  <  J(-^)  ,  g^  is  a  vector  of  gradients  of  the  negative  log 


likelihood  function 


gk  =  St  6=8, 


(2.18) 


and  is  an  approximation  to  the  Hessian  of  J 


(2.19) 
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Furthermore,  it  can  be  shown  that 
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The  major  computational  savings  due  to  parallel  processing  is  in  evaluating 
g^,  v  and  B  in  the  Kalman  filter.  Clearly,  the  complexity  of  this 
technique  requires  parallel  processing  for  real-time  missile  applications . 
The  stability  of  the  Kalman  filter  computations  also  needs  to  be  closely 
monitored  (say  via  an  expert  system)  so  that  the  innovations  are  proper. 

An  alternative  to  the  parallel  maximum  likelihood  (PHL)  method  is  the 
parallel  maximum-a-posteriori  (FMAP)  method.  The  PMAP  method  is  based  on 
solving  nonlinear  two-point  boundary  value  problems  via  the  parallel 
shooting  method.  Although  this  method  is  batch-processing  oriented  (the 
PML  method  is  recursive  and  as  such,  is  well  suited  fc  •  real-time  applica¬ 
tion),  the  PMAP  method  can  be  useful  in  flight  if  the  batch  processing  can 
be  performed  within  the  sample  time  of  the  measurements.  The  PMAP  method, 
however,  will  provide  superior  smoothed  parameter  estimates  compared  with 
the  PML  method.  Thus,  it  may  be  appropriate  to  consider  the  PMAP  approach 
for  off-line,  ground-based  missile  data  reduction. 

Parallel  Maximum-a  Posteriori  Method 

Consider  the  nonlinear  aircraft  state  and  measurement  model  represented 

by: 

x ( t )  -  f(x(t),t)  +r(x(t),t)  w ( t )  (2.21) 

z(t)  -  h(x(t) ,t)  +  v ( t )  (2.22) 

where  x(t)  is  an  n-dimensional  augmented  state  vector  which  includes  the 
unknown  parameters,  w(t)  is  a  p-dimensional  process  noise,  and  z(t)  is 
the  m-dimensional  measurement  vector  which  has  been  corrupted  by  the 
measurement  noise  v(t). 

T 

Now  let  m  ■  the  mean  of  x(tn)  and  E(x(tn)  x  (t0))  -  p  -  the 
x  O'  0'  0"  x„ 

T 

covariance  of  x(tfi).  Similarly,  let  E(ww  )  "  Q  -  the  process  noise 

T 

covariant  and  e(w  )  ■  R-  the  measurement  noise  covariance.  In  addition, 
suppose  z  and  v  have  a  zero  mean  in  this  analysis  (if  not,  a  bias  term 
can  be  identified). 

Now  let  Z t  »  (z(t)  t^  i  t  -  t)  define  the  accumulated  noisy  state 
measurements  up  to  and  including  time  t.  The  problem  is  to  obtain  an 


k 


y. 


estimate  of  the  augmented  state  vector  x(t)  at  time  t  on  the  basis  of 


the  observations  represented  by  Zg.  Our  interest  will  be  restricted  to 


the  case  when  s  >  t  in  which  case  x  .  is  referred  to  as  a  "smoothed' 

t  s 


estimate  of  x(t) . 


By  defining  p(x;tjzt)  as  the  a  posteriori  probability  that  the 
state  vector  assumes  the  value  x  at  time  t  conditioned  upon  the 
measurement  data  represented  by  Zt,  the  maximum-a-posteriori  (MAP) 


*MAP 

estimate  of  xt|s  (denoted  as  x^“|g)  *s  defined  by 


-MAP 


p(xt  |s;t lzt)  ■  max  p(x;t|zt) 


t0  ~  t  <  s  -  tf 


(2.23) 
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It  has  been  shown  that  the  maximization  indicated  in  Eq.  (5.24)  is 
equivalent  to  finding  the  deterministic  signal,  w(t)  tc(tQ,t^)  which 


minimizes  the  functional 


J  -  h I |x(tQ) 


-  m  ||^  +  *5  ( 

xo  V  Jt 


£<  ]  I  2 ( t )  -  h(J<t),t)  I  I2 


4  |  |w(t) 


1  _1 
Q  (t) 


0 

)  dt 


R  (t) 


(2.24) 


subject  to  the  dynamic-equality  constraint  given  by 


X ( t )  -  f(x(t),t)  4  f(x(t),t)  W ( t ) 


V  t  e  (t0,tf) 


(2.25) 


Note  that  the  SAP  estimation  problem  has  been  converted  to  a  determi¬ 
nistic  optimization  problem  and  that  once  w(t)  is  found  such  that  equation 
(2.24)  is  minimized,  equation  (2.25)  can  be  integrated  to  obtain  the  MAP 
estimate  of  x(t)  provided  x(tg)  is  known. 

To  find  w(t)  using  the  calculus  of  variations,  let  the  Hamiltonian 
be  defined  as 


H  *  4(  I  z 


(t)  -  h(x(t) ,t)  |  j  4  | |w(t)  | | 

R  (t)  Q_1 (t) 


) 


4  A  (t)  (f(x(t),t)  4  r(x(t),t)  w(t)) 


V  t  c  (tQ,tf)  (2.26) 
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The  most  direct  method  for  solving  this  problem  would  be  to  initially 

set  x(tn)  to  m  ,  integrate  Eq.  (2.25)  forward  in  time  over  the  inter- 
U  x0 

val  (t^.t^)  and  evaluate  the  performance  index  (2.24).  By  considering 
changes  in  the  performance  index  due  to  changes  in  xUq),  one  may  use 
this  information  to  decide  if  this  procedure  should  be  repeated.  Specifi¬ 
cally,  if  the  change  in  J  is  sufficiently  small  and  the  gradient  of  J 
equals  zero,  then  x(tQ)  is  accepted.  Otherwise,  the  value  of  x(t^) 
should  be  selected  such  that  the  performance  index  is  minimized.  To  speed 
computations,  parallel  integration  methods  can  be  used  to  integrated  Eq. 
(2.25),  while  the  selection  of  the  next  value  of  x(t^)  can  be  made  using 
an  optimization  method. 

2.3  PARALLEL  KALMAN  FILTERING  BASED  ON  THE  DECOUPLING  PRINCIPLE 

To  speed  up  Kalman  filter  computations,  several  methods  have  been 
considered  during  the  past  decade.  For  linear  filtering,  the  Kalman  filter 
equations  have  been  coded  on  an  array  processor.  This  method  works  well 
for  large  models  but  not  for  relatively  small  models  (models  with  less  than 
20  state  variables).  Small  models  frequently  occur  in  target  tracking, 
ravigation,  guidance  and  control.  Hence,  there  is  a  need  to  develop  fast 
..lman  filter  methods  for  small  models.  In  particular,  in  the  context  of 
target  tracking,  each  target  can  be  represented  by  a  nine-state  model.  The 
problem,  however,  is  to  estimate  the  trajectory  of  multiple  targets  in  real 
time.  Although  many  target  trajectories  can  be  computed  in  parallel ,  ulti¬ 
mate  tracker  performance  is  dependent  on  the  speed  in  which  each  target 
trajectory  can  be  predicted  with  the  Kalman  filter. 

To  provide  a  more  accurate  solution,  the  Parallel  Kalman  Filter  (PKF) 

is  based  on  the  trapezoidal  rule  of  numerical  integration  rather  than  Euler 

integration  which  has  been  used  to  date.  The  Parallel  Kalman  Filter  update 

2 

is  then  accurate  to  0(h  )  rather  than  0(h).  The  additional  accuracy  is 
important  because  it  is  anticipated  that  the  integration  step  size,  h, 
will  be  large  due  to  the  computational  complexity  of  the  Kalman  filter 
equations.  For  example,  with  a  sample  rate  of  100  Hz,  the  Euler-integra¬ 
tion-based  PKF  would  be  accurate  to  0(h)  *  0.01,  while  the  trapezoidal- 

2 

rule-based  PKF  would  be  accurate  to  0(h  )  -  0.0001  (i.e.,  as  accurate  as 

the  12-bit  sensor  data). 
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2.3.1  Nonlinear  Extended  Parallel  Kalman  Filter  Based  on  the  Trapezoidal 
Rule 

The  trapezoidal  rule  for  integrating  a  set  of  ordinary  differential 
equations  is  given  by: 


xk  +  ^  h  (f(xk,tk)  +  f<*k+1.tk+1» 


(2.27) 


where  x  is  the  solution  of  the  ode,  f(x,t)  is  the  right-hand-side  (RhS) 
of  the  initial  value  problem  and  h  «  tk+^  -  tk  is  the  integration  step 
size.  The  trapezoidal  rule  is  an  implicit  method  since  xk+^  appears 
implicitly  on  the  RHS  of  Eq.  (2.27).  Note  that  f(x,t)  can  be,  in 
general,  a  nonlinear  function  or  linear  such  as  f(x,t)  «  Fx(t).  To  solve 
Eq.  (2.27),  a  predictor  is  needed  of  the  form  below  to  estimate  x  • 


Xk+1  *  Xk  +  hf(xk’V 


(2.28) 


Hence,  by  combining  (2.27)  and  (2.28)  we  obtain  a  predictor-corrector  method 


based  on  the  trapezoidal  rule: 


Predictor:  x£  -  x?  +  hf(x5,t.) 


Corrector : 


V‘h  +  f  <xk+l*tk-Ki:> 


(2.29) 


(2.30) 


Note  that  the  predictor  must  be  evaluated  before  the  corrector  equation  can 
be  computed.  A  parallel  predictor-corrector  (PPC)  method  allows  the 
predictor  and  corrector  to  be  evaluated  simulataneouslv  on  two  processors 
as  follows: 


2.3.2  Parallel  Trapezoidal  Rule  (Two  Processors) 


Predictor : 


(2.32) 

(2.32) 


Corrector:  x£  ■  x£_^  +  h  (f£  +  f£  ^)  (2.32) 

where  f£  -  i(x£,k)  and  f£_j  -  f(xk_1,k-l). 

In  the  special  case  when  f£  is  the  RHS  of  the  Kalman  filter  state 
update  before  a  measurement  and  is  the  RHS  of  the  covariance  update 

before  a  measurement  then  the  two-processor  extended  parallel  Kalman  filter 

is  given  by: 


mom 


Nonlinear  Extended  Parallel  Kalman  Filter  Based  on  Trapezoidal  Rule  (Two 
Processors) : 


Time  Update: 

Predictor:  (-)  «  j  <•+)  +  2h  f(x^(-)) 

W->  •  pk-i<+)  +  2h  Gk 


(2.33) 

(2.34) 


Corrector : 


(2.36) 

(2.37) 


where  GP  -  (-)  )Pk_1  (-)-*-PR_1  (-) F  (x^  (-) )  +Qk_j  (2.35) 

xk(-)  -  xk_j(+)  +  >5  h  (f(xk(-))  +  f(xk_j(+))) 

Pk(-)  -  pk_J(+>  +  h  h  (Gk  +  Gk-15 

where  G^  -  F(\_1  (+)  )pk_j  <+)+pk.1  (+)F  (*k.j (+)  )+<^k_l  (2.38) 
Measurement  Update: 

V+)  -  +  V*k  -  hkak'-)>> 

V+>  ■  (I  -  WV-»>  v-> 


WX'V-”  *  (Hk(5k<-»rk<-)»k <*„(-))  ♦  V'1 


(2.39) 

(2.40) 

(2.41) 


where 


w-» 


3f(x(t  )) 

-W 


dh(x(tk>) 


x(t) 


dx(t, 


x(tk)  -  xk(-) 


In  the  above  PKF,  the  (- )  notation  represents  a  value  before  a  measurement 
update  and  the  (  +  )  notation  is  a  value  af ter  a  measurement  update. 
Similarly,  the  p  for  predictor  corresponds  to  the  (-)  notation  and  the 
c  for  corrector  value  corresponds  to  the  (+)  notation. 


Nonlinear  Extended  PKF  Based  on  the  Trapezoidal  Rule  (Four  Processors) 
With  four  processors,  the  parallel  trapezoidal  rule  is  given  bv: 
2.3.3  Parallel  Trapezoidal  Rule  (Four  Processors) 


Predictor : 

xP 

2k+2 

“  x2k-2  +  4  h  f2k 

(2.42) 

xP 

2k+l 

_  c  .  3  .  ,fp  .  rp 

2k-2  T  2  h  a2k  T  1 2k  —  1 ' 

(2.43) 

Corrector : 

X2k  ’ 

xC  -  —  ( 3f P  _  of P  ) 

2k-3  2  *■  J2k  *  2k-l' 

(2.4-0 

X  2k—  1 

x2k-3  +  2  h  f2k-2 

( 2 .  -4  5 ) 

where 

f2k  “  f(x2k,2k)’  f2k-l  *  f (x2k-l ,2k_1 )  and 

f(x2k-2'2k"2) 
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For  the  nonlinear  EPK.F,  the  time  update  is  given  by: 
Time  Update: 

Predictor:  i2k+2(->  -  *2k-2 (+)  +  *  h 

P2k+2 (_)  "  P2k-2(+>  +  4  h  G2k 


X2k+1 


x2k-2(+)  +  T  h  (f2k  +  f2k-l) 


P2k+l(-)  “  P2k-2(+)  +  2  h  (G2k  +  C2k-1* 
Corrector:  £2k<->  -  *2k_3<-)  -  \  (3fPk  -  9fPk_j) 
•s  /  \  /  \  ^  /. s^P  <-w/-»P  \ 


P2k(_) 


P2k-3(_)  ~  2  (3G2k  "  9G2k-l) 


X2k-1(_)  “  X2k-3(-)  +  2  h  f2k-2 
P2k-l(->  -  P2k-3(-)  +  2  h  G2k-2 


p 

2k 

m 

f(x2k(-» 

P 

2k- 

-1 

*  f(x2k-l 

(-)) 

c 

2k- 

-2 

-  f(x2k-2 

(+)) 

P 

'2k 

- 

FU2k<-))P2k(-)  ♦  P2k 

(-)F(x2k 

(-))  + 

Q2k 

■P 

'2k- 

-1 

‘  P(X2k-l 

(-))P2k_i(-) 

+  P2k-1 

(-)F(x 

2  k—  1 

*  Q2k-1 

,c 

'2k- 

-2 

F(x2k-2 

(+) )P2k-2(+) 

+  P2k-2 

(+)  F  (x 

21,-2 

+  Q2k-2 

Measurement  Update: 

X2k-l(+)  “  X2k-1(_)  +K2k-l(z2k-l  '  h2k-l (x2k-l (_) ) } 

P2k-l(+)  “  (I  K2k-lH2k-l (x2k-l (-)))P2k-l (_) 

K2k-1  ‘  P2k-1  (~)H2k-l  (x2k-l  ('):)*(H2k-l  (x2k-l (_)  )P2k-l  (-)H2k-l  (xl 


+  R2k-1} 


(2.46) 

(2.47) 

(2.48) 

(2.49) 

(2.50) 

(2.51) 

(2.52) 

(2.53) 
(2.5-0 

(2.55) 

(2.56) 

(2.57) 
-)) 

(2.58) 

O) 

(2.59) 

(2.60 ) 
(2.61) 

-1(_)) 

(2.62) 


where 


F(x2k-1(-)) 


f(x(t2k-l}) 

x(t2k-l) 


VI 

I 


e: 


x2k-l(-) 


H2k-1 (x2k-l(~)) 


h(x(t2k-l)} 
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This  section  discussed  the  parallel  estimation  methods  that  are  needed 
for  real-time  target  tracking.  These  methods  are  well  suited  for  implemen¬ 
tation  on  several  Systolic-481  parallel  numeric  processors  due  to  the  high 
degree  of  parallelism  associated  with  each  method.  The  next  section 
develops  parallel  architectures  that  are  well  matched  to  the  parallel 
Kalman  filter  algorithms  to  achieve  further  reductions  in  computation  time. 


SECTION  3 


THE  WSMR  PARALLEL  COMPUTING  ARCHITECTURE  FOR 
KALMAN  FILTERING 


3.1  PARALLEL  KALMAN  FILTER  ARCHITECTURES  BASED  ON  THE  DECOUPLING  PRINCIPLE 

Now  that  the  systolic  computational  primitive  architectures  have  been 
summarized,  we  shall  develop  parallel  architectures  for  the  PKF  methods. 

The  computational  primitive  architectures  can  be  applied  to  the  PKF  when 
evaluating  the  RHS  functions  (e.g.,  f(x,t)  -  Fx(t),  etc.).  Because,  in 
general,  f(x)  can  be  linear  or  nonlinear,  generalized  or  systolic-like 
architectures  must  be  considered.  Hence,  the  computational  primitive 
architectures  for  linear  algebra  must  be  expanded  to  consider  nonlineari¬ 
ties.  With  this  in  mind,  generalized  systolic-like  architectures  are 
presented  in  the  remainder  of  this  section  for  implementing  the  trapezoidal- 
rule-based  PKF. 

3.1.1  PKF  Architectures  Based  on  the  Trapezoidal  Rule  (Two  Processors) 

In  the  last  section,  the  dual-processor  PKF  was  developed  based  on  the 
trapezoidal  rule.  In  particular,  Eqs.  (2.33)  to  (2.41)  define  the  method. 
The  systolic  architecture  for  the  PKF  predictor  equations  (2.33)  and  (2.36) 
can  be  most  easily  derived  from  the  signal  flow  graph  (SFG)  of  the  PKF 
equations.  The  SFG  of  this  method  is  shown  in  Figure  3-1.  Note  that  the 
computations  behind  the  computational  wavefront  can  proceed  in  parallel  on 
separate  processors  by  forcing  the  corrector  to  lag  the  predictor  by  one 
time  step.  As  it  turns  out,  this  is  fundamental  to  all  the  PKF  methods 
based  on  the  decoupling  principle.  In  this  diagram,  h  (the  integration 
step  size)  is  related  to  the  data  sample  rate  in  the  filter.  For  example, 
if  the  sample  rate  is  100  Hz  then  the  sample  period  is  1/100  «  0.01  ■  h  « 
integration  step  size.  Because  the  structure  of  the  PKF  state  update 
(predictor  and  corrector)  are  essentially  the  same  as  the  covariance  update 
(predictor  and  corrector)  without  loss  of  generality  the  SFG  and  svstolic- 
like  architectures  for  the  state  update  are  discussed  here. 
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Computational  Wavefront 

Figure  3-1 :  SFG  of  PKF  Based  on  Trapezoidal  Rule 

The  generalized  systolic  architecture  for  implementing  the  PKF  based 
on  the  trapezoidal  rule  can  be  derived  from  the  SFG  of  Figure  3-1  (see 
Figure  3-2).  Figure  3-2  shows  that  the  PKF  is  decoupled  into  two  parts: 
the  predictor  processor  architecture  and  corrector  processor  architecture. 
Note  that  both  architectures  utilize  the  traditional  inner-product  cell  and 
a  set  of  registers  to  hold  intermediate  values  of  x  and  f.  Note  also  in 
the  corrector  that  the  output  of  one  inner-product  cell  maps  directly  into 
the  input  of  the  next  inner-product  cell.  The  value  h/2  also  flows 
through  from  cell  to  cell. 

3.1.2  PKF  Architectures  Based  on  the  Trapezoidal  Rule  (Four  Processors) 

As  in  the  previous  section,  the  systolic-like  architecture  for  the 
four-processor  PKF  method  can  be  derived  from  its  signal  flow  graph  (SFG). 
The  SFG  for  the  method  (Eqs.  (2.46)  to  (2.59))  is  shown  in  Figure  3-3. 

Once  again  Figure  3-3  illustrates  that  the  computations  can  proceed  in 
parallel  becuase  the  computations  ahead  of  the  computational  wavefront 
depend  only  on  data  behind  the  front.  The  four  processors  consist  of  two 
processors  for  the  predictor  (A  and  B)  and  two  for  the  corrector  (A  and  B) . 
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Figure  3-2:  Dual-Processor  PKF  Architecture  Based  on  the  Decoupling  Principle 

The  gains  associated  with  each  processor  are  dependent  on  the  accuracy 
requirements  and  sampling  rate  through  the  step  size  parameter,  h. 

The  PKF  architecture  for  four  processors  shown  in  Figure  3-4a  and  b  is 
derived  from  the  SFG  in  Figure  3-3.  These  parallel  architectures  can  run 
simultaneously  providing  a  computational  speed  advantage  of  4x.  The  struc¬ 
ture  of  these  PKF  architectures  are  not  fundamentally  much  different  than 
the  two-processor  case  in  that  inner-product  cells  and  registers  are 
employed.  The  stack  size  and  number  of  inner-product  cells  has  increased 
however.  The  data  flow  with  the  cells  is  identical  to  the  two-processor 
case. 
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3.2  THE  WSMR  PARALLEL  COMPUTING  HARDWARE  ARCHITECTURE 


The  WSMR  parallel  computing  architecture  is  based  on  industry-standard 
components.  It  utilizes  an  open  architecture  centered  around  the  Vme  bus  so 
that  the  government  can  add  special  function  cards  to  the  testbed  as  desired 
(see  Figure  3-5) .  Twelve  Systolic-482  boards  provide  48  parallel  processing 
elements  and  three  Mbytes  of  fast  statis  RAM  as  local  memory. 

Local  processor-to-processor  communications  can  be  achieved  over  the  Vme 
local  bus.  On-board  parallelism  can  be  configured  by  each  Systolic-482 ’ s 
software  controlled  state  machine.  Global  communications  between  bulk  memory, 
the  twelve  Systolic-482  parallel  processing  cards  and  the  master  CPU  are 
carried  out  over  the  Vme  bus.  Sixteen  Mbytes  of  global  memory  is  provided  for 
main  program  memory  storage.  The  master  CPU  contains  one  Motorola  68030,  one 
68882  math  coprocessor  and  eight  Mbytes  of  RAM.  The  master  CPU  is  the  program 
task  scheduler  and  manages  the  PKF  algorithm  computations.  A  25  MFLOP  (64-bit 
array  processor)  is  also  proposed  for  accurate  FFT's,  solving  ill-conditioned 
linear  systems  of  equations  and  the  linear  algebra  in  the  Parallel  Kalman 
Filter . 

3.2.1  64-bit  Floating-Point  Array  Processor  Hardware 


The  linear  algebra  requirements  of  the  WSMR  testbed  can  be  most  effectively 
met  with  a  single-board  Vme  bus  array  processor.  At  32-bit  precision,  20  MFLOP 
array  processor  boards  are  commonly  available  for  the  Vme  bus.  At  64-bit 
precision,  however,  very  few  off-the-shelf  products  exist.  The  VORTEX  from 
Sky  Computers  for  example  performs  64-bit  floating-point  operations  at  an  eight 
MFLOP  rate.  To  meet  WSMR  throughput  requirements,  at  least  three  (perhaps  four) 
of  these  cards  are  needed. 

As  an  alternative  to  the  above,  we  recommend  that  a  next  generation  64-bit 
Vme  bus  array  processor  be  designed  by  Systolic  Systems'  technical  staff.  New 
chips  from  Weitek,  AMD,  TI,  IDT,  and  BIT  should  all  be  evaluated  for  potential 
design  into  the  WSMR  testbed.  We  should  design  in  the  fastest  technology 
available  to  ensure  an  extended  life  time  for  the  testbed. 

The  high-speed  linear  equation  solver  hardware  could  be  fabricated  as  a 
two-board  set  (see  Figure  3-6).  Board  II 1  supports  the  transfer  of  data  to  and 
from  the  host  and  a  high-speed  memory  array.  The  memory  array  design  is  multi- 
ported  to  allow  access  by  multiple  computational  boards  (e.g.,  Systolic-482 
cards  via  the  Vmx  port).  The  mathematical  operation  sequences  are  under 
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firmware  control  by  board  1i2.  Each  board  has  64-bit  chips,  one  multiplier  and 
one  adder.  Two  real-time  assembly  language  codes  shall  be  written  to  handle 
data  communications  between  the  host  and  the  Systolic  array  processor/FFT 
cards. 

3.2.2  Memory  Considerations 

The  parallel  Kalman  filter  algorithms  and  architectures  derived  earlier 
use  decoupling  to  permit  the  predictor  and  corrector  equations  to  be  computed 
on  separate  processors.  At  any  given  time  step  k,  the  state,  covariance, 
and  measurements  must  be  stored,  as  well  as  intermediate  values  associated 
with  the  linear  PKF's  matrix/vector  calculations.  For  a  typical  nine-state 
filter,  the  operation  count  given  earlier  indicates  that  the  number  of 
operations  in  the  standard  Kalman  filter  is 


additions : 


multiplications: 


divisions : 


n  x  n  +  2n  -  1 


2n  x  n  +  4n  +  1 


when  n  *  9.  The  data  storage  requirement  is  on  the  order  of  8  bytes  x 
(98  +  199  +  1)  *  2384  bytes  of  2  Kbytes  for  64-bit  precision.  If  the  linear 
two-processor  PKF  is  used,  it  must  be  initialized  by  running  the  standard  SKF 
for  the  first  two  tune  steps.  Then  the  dual  processor  PKF  can  be  run  at  step 
3  (i.e.,  at  k  «  3) .  Hence,  the  following  values  of  x,  4,  P,  z,  K,  H 
and  R  must  be  stored  in  memory. 


State  Vector  Values: 


State  Transition  Matrix  Values: 


Covariance  Gain  Values: 


Kalman  Gain  Values: 


Others : 


XQ(+),  X  (+)  ,  xQ(-),  x2(-) 


V  <tl‘  42 


PQ(+),  Pi(+>.  *!<->.  V”) 


H.  ,  H~ ,  R. ,  R„ 


Once  the  linear  processor  PKF  is  initialized,  memory  is  needed  to  store  the 
updated  values  of  x,  4,  P,  z,  H  and  R.  Hence,  in  general,  the  overall 
memory  requirement  is: 

Memory  *  (np  +1)  x  (storage  requirement  of  the  standard  Kalman  filter) 
where  np  =  the  number  of  parallel  processing  elements. 
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Thus,  (np  +1)  x  (2  Kbytes)  are  needed  to  store  the  data  in  the  parallel 
filter.  With  np  =  2,  this  corresponds  to  6  Kbytes.  With  np  =  48,  this 
corresponds  to  approximately  98  Kbytes  of  RAM. 

Note  that  the  above  memory  sizing  is  for  data  only.  The  PKF  program 
memory  has  not  been  sized.  Because  the  two-processor  PKF  algorithm  can  be 
coded  with  less  than  1000  lines  of  code  and  the  compiled  version  of  a  1000 


line  program  requires  about  128  Kbytes  of  RAM  to  store,  a  reasonable  estimate 
of  the  storage  requirement  for  the  linear  PKF  program  would  be: 

PKF  Program  Memory  *  np  x  128  Kbytes 

where  np  «  the  number  of  parallel  processing  elements.  With  48  processors, 
the  program  memory  is,  therefore,  estimated  to  be  48  x  128  Kbytes  »  6  Mbytes. 
The  four-processor  PKF  would  need  2x6  Mbytes  =  12  Mbytes  of  RAM. 

Hense,  a  parallel  processor  with  6  Mbytes  of  bulk  memory  (i.e.,  relatively 
slow  DRAM)  and  98  bytes  of  fast  RAM  (i.e.,  cache  memory)  should  be  capable  of 
implementing  a  48  processor  linear  PKF. 

Because  nonlinear  function  evaluation  generally  results  in  more  inter¬ 
mediate  values  than  linear  matrix/vector  operations,  the  amount  of  local  data 
storage  might  be  increased  by  a  factor  of  2.5x.  Hence,  2.5  x  98K  -  256K  of 
fast  RAM  is  recommended  for  nonlinear  extended  parallel  Kalman  filter  data 
storage.  Twelve  to  sixteen  Mbytes  of  program  memory  should  be  sufficient, 
however,  for  the  nonlinear  PKF. 

3.2.3  Parallel  Processor  Selection 

One  method  of  estimating  the  computational  requirements  for  the  parallel 
Kalman  filter  is  to  total  the  number  of  additions,  multiplications  and 
divisions  needed  to  complete  one  cycle  of  the  Kalman  filter  algorithm.  For 
example,  the  simple  Kalman  filter  algorithm  defined  in  Section  1  requires  only 
98  additions,  199  multiplications  and  1  division  per  cycle  for  a  nine-state 
filter.  Hence,  at  100  cycles  per  second  (i.e.,  100  Hz  sample  rate)  the  number 
of  arithmetic  operations  is  given  by  100  x  298  -  29,800  operations  per  second. 
Ideally,  a  microprocessor  capable  of  33.1  usee  per  operation  is  all  that  is 
needed  to  implement  a  nine-state  filter.  Hence,  a  single  Motorola  68020/68881 
pair  can  easily  handle  the  computational  requirements  of  the  Kalman  filter 
assuming  100%  efficiency.  Note  that  33.6  usee  per  pass  through  the  filter 
corresponds  to  an  update  rate  of  39,800  samples  per  second. 
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Typically,  however,  only  10  to  30%  of  peak  performance  is  achieved  in 
practice  due  to  data  bus  and  memory  access  time  restrictions.  Therefore,  one 
target  may  be  updated  at  a  4000  updates-per-second  rate.  For  nonlinear 
filtering  typical  of  target  tracking  problems,  64-bit  precision  and  the  need 
to  compute  trigonometric  functions  for  coordinate  transformations  can  slow 
computations  down  by  one  or  perhaps  two  orders  of  magnitude  (lOx  to  lOOx)  . 

Since  it  is  well  known  that  Kalman  filtering  must  be  performed  using 
floating-point  arithmetic  to  avoid  stability  problems,  the  only  viable  method 
to  gain  back  the  throughput  for  nonlinear  filtering  problems  using  an  extended 
Kalman  filter  is  with  parallel  processing.  Therefore,  to  rapidly  implement 
the  parallel  Kalman  filter  with  32/64-bit  floating-point  precision,  a  parallel 
processing  system  of  48  processors  needs  a  computation  rate  of  2.52  MFLOPs  per 
processor  to  perform  the  necessary  computations.  Using  four  25  MHz  Motorola 
68882  math  coprocessors  per  board,  2.52  MFLOP  performance  is  readily  achieva- 
able.  Hence,  with  twelve  boards  it  is  feasible  to  track  one  target  in  real 
time. 

The  general-purpose  nature  of  the  Motorola  68020/68882  processors  is 
well  suited  for  nonlinear ,  as  well  as  linear,  Kalman  filtering.  In  particular 
because  trigonometric  functions  (sin,  cos,  tan,  etc.)  and  square-roots 
commonly  occur  in  coordinate  transformations  associated  with  lead-angle 
prediction,  high  speed,  general-purpose  hardware  (such  as  the  Systolic-481 
parallel  numeric  processor)  is  needed  to  handle  the  throughput  requirements. 

3.3  THE  VJSMR  PARALLEL  COMPUTING  SOFTWARE  ARCHITECTURE 

The  testbed  software  architecture  can  be  divided  into  three  major  areas: 
(1)  host-to-testbed  communication  software,  (2)  Master  Processor  embedded 
software  and  (3)  Systolic-482  parallel  numeric  processor  software.  This 
section  describes  our  system  integration  efforts  related  to  implementing  the 
PKF  equations  on  the  testbed. 

3.3.1  Host-to-Testbed  Communication  Software 


The  testbed  is  a  "compute  engine"  for  rapidly  evaluating  nonlinear 
functions  for  the  PKF.  The  testbed  does  not  have  its  own  compiler  or  linker. 
Software  development  for  the  testbed  is  performed  on  a  "host-computer"  which 
is  familiar  to  the  user.  Since  the  testbed  utilizes  the  32-bit  Motorola  68030 
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microprocessor,  a  Motorola  development  system  or  host  such  as  a  SUN  3/260  work¬ 
station  or  Macintosh  11  (which  both  use  the  68020)  provides  a  direct  path  for 
software  development  for  the  testbed.  If  a  VAX  or  IBM  computer  is  used  as  a 
development  system  for  the  testbed,  a  cross  compiler  is  needed  to  create 
native  code  for  the  68020  in  the  testbed.  Since  Systolic  Systems  has  a  SUN  3 
and  Macintosh  II,  these  systems  have  been  used  for  software  development  for 
the  testbed. 

Once  created,  an  application  such  as  the  PKF  code  must  be  downloaded 
to  the  testbed,  executed  and  the  results  uploaded  back  to  the  host.  The 
performance  of  this  procedure,  commonly  employed  in  the  industry,  depends 
on  the  data  transfter  link  between  the  host/testbed.  An  RS-232C  serial  link 
using  Motorola  S-record  format  has  been  created  and  tested.  A  high-speed 
RS-422  link  is  being  developed.  These  two  approaches  provide  standard  inter¬ 
faces  to  the  testbed  and  are  quite  useful,  even  if  an  ethernet  link  is 
developed  between  the  host/testbed. 

3.3.2  Testbed  Master  Processor  Software 

The  real-time  software  in  the  master  processor  is  responsible  for  overall 
system  operation  and  managing  the  interactions  with  the  host/user.  From  a 
parallel  computing  point  of  view,  the  master  is  a  task  scheduler.  It  schedules 
tasks  (i.e.,  f loating-pomt  operations,  subroutines)  to  execute  on  one  or  more 
Systolic-482  cards.  As  discussed  in  the  next  section,  tasks  tend  to  involve 
quad  (four)  simultaneous  floating-point  operations,  such  as  ADD,  SUBTRACT, 
MULTIPLY,  DIVIDE,  TRIG,  SQUARE-ROOT.  Also,  linking  of  tasks  into  a  string 
and  routing  these  tasks  to  available  482  cards  is  the  responsibility  of  the 


master  processor, 


The  self-test  diagnostics  for  the  Systolic-482  include  a  memory  test,  LED 
status  indicators,  bus  integrity  test,  floating-point  integrity  test  and  mail¬ 
box  activity  test.  The  482  memory  test  includes  a  walking  ones  and  zeroes  test 
through  all  256  Kbytes  of  its  local  memory.  A  mix  of  instructions  with  known 
answers  is  used  to  test  the  68020  microprocessor.  Light  Emitting  Diodes  (LEDs) 
are  used  for  status  indicators.  The  LEDs  light  each  time  the  482  executes  a 
major  piece  of  software  correctly  indicating  the  482  is  healthy.  Known  data  is 
also  passed  from  EPROM  to  RAM  and  compared  to  verify  that  the  32-bit-wide  data 
bus  on  the  482  is  operationally  sound.  A  known  set  of  floating-point  numbers 
is  also  used  to  verify  that  the  four  68882  math  coprocessors  can  accurately 
perform  floating-point  operations  reliably.  Additionally,  known  messages  are 
passed  through  the  482's  mailbox  to  ensure  its  integrity.  Comprehensive 
diagnostic  test  routines  were  written  to  exercise  all  the  above  back-to-back 
and  display  the  pass/fail  results  for  several  thousand  passes.  This  diagnostic 
self-test  software  can  be  involved  by  the  user  to  "check"  the  testbed  at  any 
time. 


3.3.5  Floating-Point  Operations 

Although  the  482  can  operate  in  32-bit,  64-bit  or  80-bit  extended  preci¬ 
sion,  the  floating-point  test  code  was  written  for  64-bit  floating-point 
operation.  The  64-bit  test  numbers  can  be  entered  using  the  VFILL  ("vector 
file")  routine  to  allow  the  user  to  pick  which  numbers  to  test.  Since  the 
answer  is  known  by  the  user,  he  can  type  the  answer  in  and  the  software  can 
subtract  the  numbers  and  display  the  error  (if  any).  Thus,  this  test  can  be 
tailored  by  the  user. 

Floating-point  operations  on  the  482  can  be  computed  as  singles  (on  one 
68882  math  coprocessor)  or  quads  (on  four  68882s  simultaneously) .  Singles 
are  useful  for  truly  "scalar"  processing.  Quads  are  desireable  for  vector 
operations  that  are  "chained  together"  four  elements  at  a  time.  Long  vectors 
(say  greater  than  32  elements)  can  be  computed  on  a  single  482  card  or  eight 
to  twelve  cards  depending  on  task  loading. 

The  quad  (four)  parallel  floating-point  primitives  in  the  Systolic-482 
include : 


Ai/D ,  SUB,  MULT,  DIV,  NOP,  TAN,  COS,  SIN,  SQRT,  .  etc. 


Jrx>«*V  SS&Srtft  /.*■? 


The  482  has  on-board  error  detection  code  for  the  following  error 
conditions:  bus  error,  address  error,  illegal  instructions,  division  by  zero, 
and  floating-point  coprocessor  overflow/underflow  detection.  There  error  codes 
are  accessible  through  the  mailbox  to  the  master  in  event  of  a  fault  condition. 
Hence,  the  master  can  take  appropriate  action  (e.g.,  inform  user  by  illuminating 
a  special  combination  of  LEDs). 

3.3.7  Mailbox  Error  Codes 

The  482' s  mailbox  scheme  has  its  own  set  of  status  conditions.  These 
include:  mailbox  empty,  mailbox  not  empty,  no  error  present  in  mailbox,  and 

mailbox  unknown  (i.e.,  confused).  The  status  of  the  mailbox  is  available  to 
the  testbed's  master  and  each  482  card  in  the  testbed.  The  issue  of  multi¬ 
access  to  the  mailbox,  deadlock  (i.e.,  system  fault  when  multi-access  is 
attempted)  and  task  priority  are  topics  still  under  consideration. 
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SECTION  4 


PARALLEL  KALMAN  FILTER  ALGORITHM  VALIDATION 


To  show  the  effectiveness/payoff  of  the  parallel  Kalman  filter  (PKF) 
testbed  research,  it  is  important  to  consider  a  meaningful  target  tracking 
problem.  The  test  problem  should  be  representative  of  typical  missile  appli¬ 
cations  and  serve  as  a  baseline  to  measure  the  benefits/accuracy  of  the  PKF 
technology.  With  this  in  mind,  two  test  problems  are  considered.  The  first 
problem  (test  case  it  1)  is  very  simple,  utilizes  time-varying  parameters,  and 
illustrates  the  speed/accuracy  of  the  PKF  method.  Test  case  it 2  is  more 
realistic  and  more  challenging  since  it  involves  nonlinearities  and  exponen¬ 
tials. 

4.1  TEST  CASE  £1 

Test  case  ill  i6  based  on  the  following  state  and  measurement  model: 

x (k+1 )  -  b  (k+1 , k)  x (k)  +  T (k+1 , k)  v(k)  (4.1) 

z (k+1 )  -  H (k+1 )  x(k+l)  +  v (k+1 )  (4.2) 

where  x  is  the  state  of  the  target,  b  is  the  state  transition  matrix, 
z  are  measurements  and  w  and  v  are  noise  terms. 

For  the  purpose  of  test  case  ill,  the  model  parameters  were  selected 
as  follows: 

4(k+l,k)  -  exp  (-k*0 . 001 ) ,  r(k+l,k)  -  1,  H(k+1)  «  1  (4-3) 

The  noise  terms 

w(k)  'v  N(0,Q(k))  and  v(k+l)  'v  N(0,R(k+l)  (4.4) 

had  zero  mean  and  covariance  Q(k)  and  R(k+1),  respectively. 
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In  this  case,  the  PKF  equations  are  given  by: 


procedure  new  PKF  (i:integer); 
begin 

H  [i+1) :  1  1; 

A  [i+1, i]  :  *=  *xp  (*i*0.001)  ; 

A  (i+2 , i+1] :  *  ®xp  (-  (i+1) *0.001) ; 

B  (i+1, i) :  =  D 
B  [i+2  ,  i+1)  :  =  1  >' 

{  predictor  ) 

X  [i  +  2, i  +  1):  «  A  [i  +  2, i+1)  *  A  U  +  l.i  *  *  i'i  V.  *  [i  +  l.il  *  A  [i  +  2 ,  i  +  1) 

p  (i+2  -  ;  !  o  [iS  *  B  !i+l.i)  .  A  [i+2,  i  +  1) 

+  B  [i+2, i+1)  *  o  [i+1)  *  B  [i+2, i+1); 

K  [i+^:man  (i+l.i)  *  H  [i+13  )  /  ((H  I  i+D  *  P  *  »  Ii  +  D>  "  R 

[i+1) )  ; 

x  [1+1?I+1):°=  i  [i+i.i)  +  K  [i+1]  *  (2  [i+D  ;  IH  [i+l]  *  X  [i+1,  i) ) )  ; 

P  [i+1, i+1):  =  (1  *  (K  (i+1)  *  «  [i+l)5)  *  P  Ii+l-i)> 
end;  1  new  PKF  ) 


Similarly,  the  SKF  equations,  when  coded,  appear  as  follows: 

procedure  SKF  (i:  integer); 
begin 


A  u+1. i)-’  *  exp  (-i*0.001); 

B  [i+1 ,  i)  :  =  1; 

K  [i  +  1]  :  =  1 

X  [i+1,  i]  :  =  A  [i+1 ,  i]  *  X  [  i.i]; 

P  [i+1 ,  i)  :  *=  A  [i+1 ,  i]  *  P  [i.i]  *  A  [i+l.i]  +  B  (i+l.i)  *  0  U)  * 

B  fi+i.i); 

K  fi+lj  :  ■  P  (-  fi+l,i)  *  H  [i+1))/  i  (H  [i+1]  *  P  [i+l.i)  ’  H 

[i+1]  )+  R  [i+1]  )  ; 

X  [i+1,  i+1]  :=  X  [i+1, i]  +  K  [i+1]  •  (2  [i+1]  -  (H  [i+1]  *  A  [i+l.i]  * 
X  [i+l.i])); 

P  [i+l, i+1]  :*  (1  •  (K  [i+1]  *  H  [i+1]))  *  P  [i+l.i]; 
end;  {  SKF  ) 


The  results  for  this  time-varying  case  are  shown  in  Table  4-1.  The  results 
indicate  that  the  PKF  equations  are  well  behaved  and  exhibit  similar  convergence 
characteristics  as  the  SKF.  The  optimal  solution  can  be  obtained  via  one  pass 
through  the  SKF  equations  once  the  PKF  converges. 

Since  the  computer  simulations  on  these  test  casees  were  encouraging,  the  four- 
processor  PKF  equations  were  coded  (see  below)  with  similar  results. 


TABLE  4-1  COMPARISON  OF  THE  SKF  AND  PKF  ESTIMATES  OF  AN 
EXPONENTIALLY  DECAYING  TARGET  TRAJECTORY 


Saguantial  Kalman  Filtar 

P[0,0) :10. OOOOOO  X(0,0) 

P[l.l)  :  0.099020  X(l,l] 

P[2 . 2] :  0.066535  X(2,2] 

P(3. 3)  :  0.062444  X[3,3] 

P  [4 . 4  ]  :  0.061842  X[4,4] 

P  [5 , 5]  :  0.061737  Xf5,5] 

P[6,6] :  0.061704  X(6,6] 

p[7.7]  :  0.061681  X[7,7] 

p  (6, 8]  :  0.061660  X(8,8] 

p  [9 , 9)  ;  0.061639  X(9,9] 

P[10, 10J  :  0.061618  X[10,10) 

P[ll.ll] :  0.061597  X[ll,ll] 

P (12 . 12]  :  0.061576  X[12,12] 

P  [  13 , 1 3)  :  0.061556  X(13.13) 

P  [  1 4 , 1 4 ) :  0.061535  X(14,14) 

P  (15 . 15)  :  0.061514  X[15,15) 

P  (16 , 16]  :  0.061493  X(16.16] 

p(17, 17]-.  0.061473  X  [17 . 17] 

P (1 6 , 1  8)  :  0.061452  X(16,18] 

P(19, 19]  :  0.061432  X(19.19] 

P  (20 ,20)  :  0.061411  X(20.20] 

P  1 21 , 21]  '.  0.061390  X  (21 , 21  ] 

P [22, 22] :  0.061370  X(22,22] 

P  (23 .23]  :  0.061349  X(23,23] 

P  [24 , 24]  :  0.061329  X[24,24; 

P  (25, 25]  :  0.061306  X(25,25; 

P  (26 , 26]  :  0.061288  X(26,26 

P  [27 , 27]  :  0.061268  X(27.27 

P  (28 , 28]  :  0.061247  X(28,28]: 

P  (29 , 29]  :  0.061227  X[29,29): 

PI 30, 30]  :  0.06120 7  X(30,30): 

Parallel  Kalman  Filter 

[0.0]  :  10.000000  X(0,0]  : 

p  [  1 , 1 ] ;  0.099010  Xfl.l] 

p  (2 , 2]  •.  0.099027  X(2,2] 

p [ 3 , 3 ]  :  0 . 074676  X[3.3] 

p  ( 4 , 4  ]  :  0.074839  X[4,4] 

p  (5 . 5]  :  0-073193  X[5,5) 

p  (6 , 6]  :  0.073155  X(6,6] 

p  [7 , 7]  :  0.073003  X[7,7] 

p ( 8 , 8]  :  0.072965  X(8,8] 

p  [9 . 9]  ;  0.072919  X(9,9] 

PJ10.10):  0.072881  X(10,10] 

P [11 , 11) :  0.072842  X[ll.ll] 

P  [12, 12]  :  0.072804  X(12,12] 

P  [13 , 13]  :  0.072767  X[l3.l3] 

P  f 1 4 , 1 4  J  :  0.072729  X[14,l4] 

P  [15, 15]  :  0.072691  X(lS,l5) 

P  (16 ,16]  :  0.072653  X[16,l6] 

p  [17 ,17]  :  0.072616  X[17,17J 

P  [  16 , 1 8]  :  0.072578  X(16.18] 

P (19 , 19]  :  0.072540  X(19.19) 

P(20,20]:  0.072503  X[20,20j 

P[21.21):  0.072465  X(21,21. 

P  (22 . 22]  :  0.072428  X(22,22] 

P  [23 , 23]  :  0.072390  X(23,23; 

P  (24 ’ 24]  :  0.072353  X(24,24 

P (25 ! 25] :  0.072315  X(25,25 

P  [26 , 26]  :  0.072278  X(26,26 

P  [27 , 27]  :  0.072240  X[27,27 

P  [28, 28]  ".  0.072203  X[26,28 

P  [29 , 29] :  0.072166  X[29,29 

P  (30 . 30]  :  0.072128  X[30,30 


1.000000  2(0):  1.000000 

1.000001  2(1]:  1.000001 

0.999666  2(2]:  0.999002 

0.998501  2(3]  :  0.997007 

0.996434  2 [4] :  0.994022 

0.993419  2(5]:  0.990055 

0.989442  2(6) :  0.985118 

0.984501  2(7):  0.979226 

0.978609  2(8] :  0.972396 

0.971781  2(9]  :  0.964649 

0.964037  Z  (10)  :  0.956007 

0.955396  Z [11] :  0.946496 

0.945891  Z [ 12] :  0.936142 

0.935542  Z 1 13)  :  0.924977 

0 . 924381  Z  (14)  :  0.913031 

0.912441  Z [ 1 5) :  0.900339 

0  899755  2(16)  :  0.986935 

0.886358  Z (17) :  0.872856 

0.872289  Z ( 1 8 ] :  0.858146 

0.857584  Z [19] :  0.842839 

0.842285  2(20] :  0.826977 

0.826432  2(21]:  0.810603 

0.810067  2(22) :  0.793759 

0.793232  Z (23) :  0.776488 

0.775971  Z [ 2  4 )  :  0.7SB833 

0.75B326  Z[25):  0.740839 

0.740342  Z  [26]  :  0.722549 

0.722063  2(27):  0.704005 

0.703530  Z (28) :  0.685253 

0.684789  Z(29):  0.666333 

0.665880  2(30)  :  0.647287 


1.000000  Z(0):  1.000000 

1.000002  2(1):  1.000002 

0.999003  2(2):  0.999003 

0.997008  2(3):  0.997008 

0.994022  2(4):  0.994023 

0.990055  2(5):  0.990056 

0.985116  2 | 6] :  0.985119 

0.979226  2(7)  :  0.979227 

0.972396  2 [ 8) :  0.972397 

0.964649  2(9):  0.964650 

0.956007  Z (10) :  0.956008 

0.946496  2111]:  0.956497 

0.936143  Z  (12)  :  0.936143 

0.924977  2(13):  0.924978 

0.913031  2(14):  0.913032 

0.900339  Ztl5):  0.900340 

0.886935  Z  (16)  :  0.986935 

0.872858  Z[17):  0.872859 

0.858146  Z  [18]  :  0.858147 

0.842639  Z (19) :  0.842840 

0.826977  Z (20) :  0.826978 

0.810603  Z [21] :  0.810504 

0.793759  Z [22] :  0.793760 

0.775488  Z  (23)  :  0.776489 

0.758833  Z  [ 2 4 )  :  0.758834 

0.740839  Z [25] :  0.740840 

0.722549  Z [26] :  0.722550 

0.704006  Z (27) :  0.704006 

0.685253  Z  [28]  :  0.685254 

0.666333  Z (29) :  0.666334 

0.647266  Z [30) :  0.647289 


4.2  TEST  CASE  1)2 

Consider  the  problem  of  estimating  the  position  of  an  object  (missile) 
from  angle-only  (or  range)  measuremements .  The  geometry  of  this  target 
tracking  application  is  illustrated  in  Figure  4-1.  Typical  parameter 
values  for  this  exercise  are  given  in  Table  4-2, 


/  T  s 


Missile 

/  T 


*ACAR  / 


Figure  4-1  Geometry  of  Test  Case  #  1 


In  Figure  4-1,  xt,  yt  represent  the  target  position  in  X-Y 
coordinates  and  yg  represents  the  sensor  position. 

The  target  range  is  given  by 


2t  *  +  (yt  *  ys)2)i5 


(4.5) 


With  angle-only  measurements  the  line-of-sight  angle,  5,  can  be  estim; 
as  follows: 


(4.6) 


TABLE  4-2 


The  target's  motion  is  modeled  as  a  falling  body  in  state  variable  form  as: 


X1  *  yt’  x2  *  “  yt.  x3  “  1/e>  x3  *  °*  x4  *  xt. 


xa  “  0,  *5  -  yg,  x5  -  0 


(4.7 


where  6  is  the  so-called  ballistic  coefficient  of  the  target  and  y£  is  the 
target's  height  above  the  earth. 


The  equations  of  motion  for  the  target  are: 


P 


?oe 


-Xl/kp 


(4.8 


(4.9 


where  d  is  drage  deceleration,  g  is  acceleration  of  gravity,  p  is  atncs 
pheric  density  (with  pQ  the  atmospheric  density  at  sea  level)  and  kp  is  a 
decay  constant.  The  differential  equation  of  velocity,  X£,  is  nonlinear 
through  the  dependence  of  drag  on  velocity,  air  density  and  the  ballistic 
coefficient,  6. 

Initial  values  of  the  state  variables  are  assumed  to  have  covariance 
matrix  of  the  from 

P  *  diag  (Pii  >  P22  »  P33  »  P44  *  P55  ^  (4.10) 

0  o  *”o  o  0  o 


Estimating  all  the  state  variables  may  be  solved  using  the  following 
extended  Kalman  filter: 

Predictor : 

X(t)  -  f  (X  (  t  )  )  X  (tg)  -  *0 

P(t)  -  F(x(t))P(t)  +  P(t)FT(x(t))  +  Q ( t ) ,  P(r  )  -  P„ 


(4.11) 
(4. 1?'! 


yvi- 


k'xiU'O'i.rkt’i.i'iJS 


Corrector : 


xk(+)  «  x(-)  +  Kk(zk  ~  hk(*k(_))> 

Pv(+)  -  (I  -  KkHk(xk(-)))Pk(-) 

\  -  Pk(-)Hk(xk(-))  *  (Hk(xk(-))Pk(-)  Hj(xk(-))  +  Rk)_1 


(4.13; 

(4.14) 

(4.15) 


where 


F (x (t) )  - 


Hk(x(-)) 


5  f (x (t) ) 
3x(t) 


X  ( t )  -  x(t) 


3h(x(tk)) 

3x(tk) 


x(tk)  -  x(-) 


The  nonlinear  differential  equations  were  integrated  using  the  trape¬ 
zoidal  rule.  The  partial  derivatives  were  computed  using  a  finite  differ¬ 
ence  method.  The  model  uncertainty  matrix,  Q,  and  the  measurement  noise 
covariance,  R,  were  given  by: 

Q  «  diag  (0.2,  0.2,  0.2,  0.2,  0.2)  (4.16 
R  *  diag  (0.2,  0.2)  (4.17 


PARALLEL  TRAPEZOIDAL  RULE 

After  careful  inspection  of  the  nonlinear  extended  parallel  Kalman 
filter  (EPKF)  formulation,  it  is  clear  that  the  performance  of  the  filter 
is  closely  related  to  the  accuracy  and  speed  of  the  numerical  method  used 
for  propagating  the  state  and  covariance  time  updates.  The  trapezoidal 
rule  discussed  in  the  previous  section  is  accurate  to  0(h  ).  For  h  «  0.01 
four-digit  accuracy  is  attainable.  With  h  ■  0.001,  six-digit  accuracy 
results.  Automatic  step  size  (h)  selection  based  on  a  prespecified  accuracy 
requirement  (e.g.,  norm  of  the  local  truncation  less  than  0.000001)  can  be 
achieved  using  a  variable  step  size  integration  scheme.  Time  to  solution 
can  also  be  used  a  a  criteria  for  selecting  h.  Since  it  is  anticipated 
that  computation  time  is  key,  the  parallel  methods  are  evaluated  with  a 
relatively  large  step  size  (h  «  0.01  for  each  trapezoidal  rule  being 
evaluated) . 


v.V 

(4.16) 

1  -  *  - 

(4.17; 

• 
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4.3.1  Parallel  Solution  of  Test  Case  #1 


The  target's  equations  of  motion  were  used  to  demonstrate  the  numeri¬ 
cal  properties  of  the  parallel  trapezoidal  rule.  The  initial  conditions 
and  physical  problem  parameters  were  described.  The  results  of  propagating 
the  target  altitude  (.xl)  and  target  velocity  (,x2)  with  the  parallel 
trapezoidal  methods  are  shown  in  Tables  4-3  and  4-4. 

The  results  in  Tables  4-3  and  4-4  are  based  on  a  fixed  step  size  of 

h  ■  0.01.  The  data  indicates  that  the  absolute  error  between  the  sequen¬ 

tial  trapezoidal  rule  and  the  parallel  trapezoidal  rule  is  relatively  small 
(  0.1%).  However,  the  absolute  error  for  the  target  velocity  is  growing 
with  time.  Although  at  first  glance  this  may  appear  unacceptable,  a  more 
detailed  examination  indicates  that  the  change  in  the  percent  error  norma¬ 
lized  by  the  change  in  the  variable  being  integrated  is  only  1.15x10  ^  X 
per  ft  for  the  target  altitude  and  4.4x10  per  ft/sec  for  the  target 
velocity.  Since  the  flight  time  for  this  application  is  only  15  to  20 
seconds,  the  maximum  absolute  error  on  impact  for  the  target  altitude  is 
0.11%  and  for  the  target  velocity  0.55%.  If  not  to  improve  accuracy,  a 
smaller  step  size  (say  h  ■  0.001)  is  worth  evaluating. 

The  results  in  Tables  4-5  and  4-6  are  based  on  a  fixed  step  size  of 

h  ■  0.001.  Note  that  in  each  case  the  parallel  trapezoidal  methods  are  as 

accurate  as  the  standard  sequential  trapezoidal  rule.  Note  that  the  worst- 
case  maximum  error  is  much  less  than  0.01%.  Thus,  with  a  small  step  size, 
the  parallel  methods  are  equally  good  with  the  sequential  methods  only  400% 
faster  to  compute  per  state  equation.  A  trade  off  between  speed  and 
ultimate  accuracy  must  be  made. 

4.3.2  Summary 

The  results  of  this  section  indicate  that  as  more  processing  elements 
are  used  to  speed  up  the  PKF  state  (and  covariance)  propagation,  the 
accuracy  of  the  parallel  solution  can  degrade  compared  with  satndard 
sequential  methods.  Thus,  because  the  EPKF  utilizes  the  parallel  trape¬ 
zoidal  rule  to  integrate  the  state  and  covariance  equations,  the  EPKF 
solution  should  be  less  accurate  compared  with  the  EKF.  The  execution 
speed  of  the  EPKF  should  improve  linearly  with  the  number  of  processors. 


1  .v.v.v.v^.  ^ 
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Table  4-3  State  Equation  Integration  Comparing  the  Trapezoidal  Rule 
and  the  244  Processor  Parallel  Trapezoidal  Rule  (h=0.01; 


Trapezoidal  Rule 

Parallel  Trapezoidal  Rule 

1  Processor  2 

Processors  4  Processors 

Time 
( sec ) 

Target 

A1 t i tude 
(ft) 

Target 

Altitude 

(ft) 

Target 

Altitude 

(ft) 

Absolute 

Maximum 

Error 

0.500 

290001  .923 

290001 .923 

290001  .923 

0.07% 

1  .000 

280012.071 

280012.070 

280211 .725 

0.07% 

1 .500 

270039.727 

270039.726 

270238.905 

0.07% 

2.000 

260099.339 

260099.339 

260297.685 

0.07% 

2.500 

250213.191 

250213.192 

250410.158 

0.08% 

3.000 

240415.165 

240415.165 

240609.927 

0.08% 

3.500 

230755.644 

230755.636 

230946.984 

0.08% 

4.000 

221307.016 

221306.981 

221493.216 

0.08% 

4.500 

212168.020 

212167.913 

212346.788 

0.08% 

5.0C0 

203463.303 

203463.031 

203631 .846 

0  •  0  8  <4 

Table  4-4 

State  Equation  Integration  Comparing  the  Trapezoidal  °ule 
and  the  244  Processor  Parallel  Trapezoidal  Rule  (h=0.u.) 

Trapezoidal  Rule 

Parallel 

Trapezoidal  Rule 

1  Processor 

2  Processors  4  Processors 

Time 

Target 

Target 

Target 

Absolute 

(  sec ) 

Velocity 

Velocity 

Velocity 

Max imum 

( ft/sec) 

( ft/sec) 

( ft/sec) 

Error 

0.500 

19990.367 

19990.369 

19990.690 

0.0016% 

1  .000 

19966.004 

19966.006 

19966.692 

0.0034% 

1  .500 

19918.640 

19918.642 

19919.896 

0.0063% 

2.000 

19835.603 

19835.606 

19837.737 

0.0107% 

2.500 

19697.851 

19697.859 

19701 .325 

0.01 76% 

3.000 

19477.635 

19477.654 

19483.112 

0.0281% 

3.500 

19136.462 

19136.509 

19144.840 

0.0437% 

4.000 

18625.119 

18625.229 

18637.499 

0.0665% 

4.500 

17888.965 

17889.210 

17906.486 

0.0979% 

5.000 

16882.258 

16882.759 

16905.719 

0.1389% 
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Table  4-5  State  Equation  Integration  Comparing  the  Trapezoidal  Rule 
and  the  264  Processor  Parallel  Trapezoidal  Rule  (h=0.001) 


Trapezoidal  Rule  Parallel  Trapezoidal  Rule 


Time 
(  sec ) 


0.500 

1  .000 

1  .500 

2  .000 

2.500 
3.000 

3.500 
4.000 

4.500 
5.000 


1  Processor 


Target 
A1 t i tude 
(ft) 


290001 . 
280012. 
270039. 
260099. 
250213. 
240415. 
230755. 
221306. 
212167. 
203463 . 


2  Processors 


Target 

Altitude 

(ft) 


290001 .924 
280012.072 
270039.728 
260099.341 
250213.192 
240415.163 
230755.630 
221306.970 
212167.895 
203463.006 


4  Processors 


Target 
A1 1 itude 
(ft) 


290021 .914 
280032 .038 
270059.647 
260119.176 
250232.890 
240434 .641 
230774.766 
221325.595 
212185.784 
203479.889 


Table  4-6  State  Equation  Integration  Comparing  the  Trapezoidal  Rule 
and  the  264  Processor  Parallel  Trapezoidal  (h=0.001) 


Trapezoidal  Rule  Parallel  Trapezoidal  Rule 


Time 
( 6ec) 


1  Processor 


Target 
Ve loc i ty 
(ft/ sec) 


2  Processors 


Target 
Velocity 
( ft /sec) 


4  Processors 


Target 
Velocity 
( ft/sec) 


19990.367 

19990.367 

19990.399 

19966.005 

19966.005 

19966.073 

19918.641 

19918.642 

19918.767 

19835.606 

19835.606 

19835.819 

19697.860 

19697.860 

19698.207 

19477  .656 

19477.656 

19478.203 

19136.512 

19136.512 

19137.347 

18625.234 

18625.234 

18626.463 

17889.215 

17889.215 

17890.946 

16882.764 

16882 . 764 

16885.064 

Table  4-7  2  Processor  Parallel  Trapezoidal  Rule 


douole  xlprSiCCl  ,  x3p[5lCt?l  ,  xlc[5iCCl 
,  x3'c  ZSltfi?;  ,  rhop[51i?Cl  ,  rhoc  [51CC1  ; 

double  rho,  beta,  x3,  5,  x4,  x5,  d,  h  ; 

double  xl  ,  xZ  ,  crag  ,  x  ,  krho  ,  rho®  ,  stirne  ; 
int  i  ,  nstep  ,  r.print  ; 

#  i  nr  '  <rnat  h .  h ) 

S  incl  ure  < st c j  o.  h) 

rna  u>() 

-C 

F  I  LF  *tc«ul.  1  i 

t  out  1  =•  foper,  (  "tdemol . out "  ,  ”w"  )  ; 

/***  xlp  denotes  xl  pred.  xlc  denotes  xl  corrector  ***/ 
xlpCCl  *  xl  ;  xlcCiPl  =  xl  ;  xZe [Cl  =  x£  ;  x£p[Cl  =  x£  ; 

r-hopZCl  =  rhoiZi*exp(  -x 1 p [Cl /krho)  ;  rhocZCl  =  rhoC*exp(  -xlcCCD/krho  ) 
x 1 p  l 1 1  *  xl  ;  rhopCll  =  rhoC*exp<  -xlpCll/Krho  )  ; 

xZp[  11  «=  xZ  ; 

f  or  (  i  *1  ;  i  <  =  r,step  ;  +■*•  l  )  i 

/**#  preoictor  equations  ***/ 

rhopCil  =  rhoC*exp(  -x 1 p [ i 1 / krho  )  ; 

rhoc[i-il  =  rhoC*exp<  -x  1  c  [  l  - 1 1  / krho  )  ; 

Grac,  =  rnop 1 l 1 *x£p [ l 1 *xZp [ i 1 /£/ x3  ; 

xlpli+ll  =  xlcCi-11  +  £. *h* (  -x£p[il  )  ; 

xZpCi+ll  =  xZc[i-ll  £'.*h*(  g  -  rhop  C  1 1  *  xZp  [  i  1  *x£p  [  1 1 /£/ x  3  )  ; 

/***  corrector  equations  ***/ 

xlcCil  =  xlc[i-l3  +  .5*h*(  -xZpCil  -  xZcEi-11  )  ; 

xZcLil  *  xZc[i-ll  h*  g  -  (h/4.)*(  rhop  C 1 1  *  xZp  [  1 1  *xZp  i  1 1  /  x3 

rhoc  C  i  - 1 1  *x£c  [  l - 1 1  *x£c  C  i - 1 1 /x3  )  ; 


st  ime  *  i*h  ; 

nprint  «  nstep/lC  ; 

if(  <  i /r,pr  int )  *r,pr  ir,t  *=  l  X 

f  pr  i  r,t  f  (t  out  1 ,  "\ri\n  time  =  *7.  3f  sec  xl  «  %7.  3f  ft  xZ  =  %1 .  3f  ft/sec 
,  stirne  ,  xlcCil  ,  xZeCil  )  ; 

pr  l  nt  f  (  " \ri\n  time  *  %7.  3f  sec  xl  *  *7.  3f  ft  xZ  =  %7.  3f  ft/sec  " 

,  stirne  ,  xlcCil  ,  xZcCil  )  ; 

> 


fclose (tout  1 )  ; 
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double  rho,  bet  a,  x3,  g,  x4,  x5,  c,  h  ; 
double  xl  ,  x£  ,  drag  ; 
double  x  ,  krho  ,  rhotf  ,  st  line  ; 
double  xlp[51003  ,  x£p[51003  ,  xlc[51003 
,  xcc  C51  003  ,  rhop[5l003  ,  rhoc[5l00>3  ; 

int  i  ,  i  ,  m  ,  nstep  ,  nprint  ; 

*1  include  (rnatn.  n) 

#  inci  ude  <st  d  io.  h> 

rna  i  n  ( ) 

< 


tout£  =  foperi<  "  t  derno£.  out  "  ,  Mw"  )  ; 


/***  xlp  Denotes  xl  p red.  xlc  denotes  xl  corrector  ***/ 
/***  initial  conditions  need  to  be  specified  are: 


xl  pC  0 

4  3 

x£p[  0 

- 

4  3 

x  lc  C  0 

— 

£  3 

x£c  [  0 

£  5 

***  / 

x 1 p  C03 

XL 

xl  ; 

x 1 c  C03 

S 

x  1  ; 

x£c [03 

s 

x£ 

• 

* 

x£p[ 03 

S 

x£  ; 

x£'p  [  1 3 

xl p[ i3 

S 

xl  ; 

xlp[£3 

S 

x  1  ; 

x2p [£3 

ss 

x£ 

• 

f 

x2p [£3 

— 

x£  ; 

x 1 p[ 33 

s 

xl  ; 

x  1  p  [43 

= 

x  1  ; 

x£p  C33 

= 

x£ 

• 

1 

x£p [43 

s 

x2  ; 

x  1  c  [  1  3 

c 

x  1  ; 

x lc  C£3 

= 

x  1  ; 

x£c  C13 

= 

x£ 

* 

x£c [£3 

= 

x£  ; 

rhop[03  =  rho0*exp(  -x  1  p [I?  1 /krho)  ;  rhocC03  =  rho0*exp(  -xlc[03/krbo  ) 

rhopCl3  =  rho0*exp<  -x 1 p C 1 3 /krho)  ;  rhocCl3  =  rho0*exp<  -x 1 c [ 1 3 / kr ho  ) 

rhopC£3  =  rho0*exp(  -x  1  p C£3 /krho )  ;  rhocC£3  =  rho0*exp(  -x 1 c [£3 /krho  ) 

rhopC33  =  rho0*exp(  -x 1 p C33 /krho)  ; 

rhopC43  =  rho0*exp(  -x 1 p [41 / xrho)  ; 

rhopC£3  =  rno0*exp(  -x 1 p C2 3 /krho  )  ; 

rhocii]  =  rho0*exp(  -xlc Cl  3 /nrho  )  ; 

/*•*#  run  only  predictor  equations  for  i  =  i  to  ootain  better  initial 
estimate  for  xlpC33  ,  xlpCA3  ,  x£pC31  ,  x£p[43  ***/ 

xlpCA3  *  xlcC03  -*■  4.  *h*(  -  x£p[£3  )  ; 

x2pC43  ■  x£cC03  ♦  4.*h*(  g  -  rhop CE3 *x£p [£3 *x£p CE3 / 2/ x3  )  ; 

xlpC33  *  xlc[03  1.5*h*(  -x£pC£3  -  x£p C 1 3  )  ; 

x£p[33  «  x2cC03  1.5«h*<  g 
-rhopC£3  *x£pt£3*x£pC23 /£/x3  g  - 
rhopC 1 3 *x£pC 1 3 *x2p C 1 3 /£/ x3  )  ; 

rhopC33  =  rho0*ex p < -x 1 p [ 33 /krho  )  ; 

rhopC43  -  rhc>0*exp  ( -x  1  p [43  / krho  )  ; 


f or  (  i  *  £  ;  i <«nstep  ;  )< 
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/***  predictor  equations  ***/ 

rhc<p!12*i3  =  rho0*exp<  -xlp[£*i3 /krho  )  ; 

rhopL£*i-13  =  rho0*exp(  -x 1 p [£* 1 - 1 3 /krho  ) 
rhoc[£*i-23  =  rho0*exp(  -x 1 c [2* i -23 /krho  ) 
rhocL£*i-33  *  rho0*exp<  -xlcC2*i-33 /krho  ) 


xlpC£*i+£3  =  xlc32*i-23  4.*h*(  -x£p[£*i3  )  ; 

x£p  C£*  l  -*-£i  *  x£cC£*i-£3  +  4.*h*<  g  -  rhop  [£*  l  3  *x2p  [£*  i  3  *x£p  C2*i  3 /£/  *2  )  ; 

xlp[£*i+13  =  xlcC£*i-£2  ♦  1.5*h*<  -x£p£2*i3  -  x£p[£*i-13>  ; 
x£p C£* i  -►  1  3  *  x£p££*i-£3  +  1.5*h*( 
g  -  rhop  Z£*  l  3  *  x£p  [£* l  3  *x£p  [£* l  3 /£/ x3 
g  -  rhop C£* i - i 3 *x£p [£* l - 1 3 *x£p [£* i- 1 3 /£/ *3  >  ; 


/***  corrector  equations  ***/ 

xlcC2*i3  =  xlcC£*i-33  +  .5*h*(  3.*x£p[£*i3  -  9*x£p[£*i-13  )  ; 

x£cC£*i3  =  x£c  ££*  i -33  -  .5*n*<  3.  *  <  g  -  rhopC£*i3* 
xc'p  ££*  l  3  *x£p  C2«  i  3 /£.  /x3)  -  9.*<  g  -  rhop C£* l - 1 3 *x£p ££*i -1 3* 
x£p££«;-13 /£. /x3  ))  ; 

xlc££*i-13  =  x  1  c  [£* i~33  -  £. *h*x£c t£* i -£3  ; 
x£cC£*i-l3  *  x£c  £2*  l  -33  2.  *h*(  g  - 

rhoc C£* i -£3 *x£c ££*i -23 *x£c ££«i -£3 /£. /x3  )  ; 

> 

fort  1  =  1  ;  1  <=  10  ;  ♦+!  H  m  =  riprint*l  ; 

st  irne  *  m*  n  ; 

f  pr  intf  (tout£  ,"\n\n  time  *  747.  3f  sec  xlc  =  747.  3f  ft  x£c  =  747.  3f  ft /sec  " 
,  st  line  ,  xlcCmU  ,  x2c£m3  )  ; 

pr  intf  ( "  \n\n  time  =  747. 3f  sec  xlc  =  747.  3f  ft  x£c  =  747.  3  f  ft/sec  " 

,  stime  ,  xlc£m3  ,  x£c£m3  )  ; 

> 

f close (tout£ )  ; 

> 
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Target  Height  Estimation 

Figure  4-2  shows  the  noisy  target  height  measurements  and  the  EKF 
estimate  of  the  target's  height  above  the  earth.  Note  that  the  EKF  esti¬ 
mates  filter  (or  reduce)  the  measurement  noise  considerably.  Also,  note 
that  the  EKF  tracks  the  data  quite  well  as  the  target  descends  toward 
earth. 

Target  Velocity  Estimation 

The  target's  velocity  is  illustrated  in  Figure  4-3.  Note  that  the 
EKF  converges  to  the  actual  velocity  within  four  seconds.  The  target's 
velocity  is  essentially  constant  at  this  altitude  making  target  prediction 
easier  once  the  EKF  has  converged.  This  knowledge  can  be  built  into  the 
knowledge  base  for  the  expert  system  that  manages  the  EKF. 

Line-of-Sight  (LOS)  Angle  Estimation 

Figure  4-4  shows  the  LOS  angle  estimate  from  the  EKF  for  test  case 
// 1.  The  LOS  angle  is  relatively  steep  (87°  on  average)  and  decreasing  as 
the  target  height  declines.  This  follows  because  the  observer  is  station¬ 
ary  while  the  target  is  moving.  Because  the  tartet  is  250,000  feet  above 
sea  level,  it  follows  that  the  LOS  angle  be  large  (approaching  90°). 

Based  on  the  change  in  the  LOS  angle  estimate  per  unit  time  (2°/4  seconds 
■  0.5°  per  second),  it  would  take  176  seconds  to  reduce  the  LOS  angle  to 
zero. 

Target  Range  Estimation 

The  target  range  estimate  from  the  EKF  is  given  in  Figure  4-5.  Note 
that  the  target  is  getting  closer  to  the  observer  at  a  rate  of  250,000 
ft/sec.  Hence,  within  ten  seconds  the  target  will  hit  the  observer  unless 
some  action  is  taken.  Thus,  the  EKF  must  be  capable  of  rapidly  updating 
the  range  estimates  (in  under  one  second)  to  be  effective  for  WSMR 
applications. 

Summary 


This  test  case  illustrates  the  class  of  computations  required  for 
WSMR  target  tracking.  Squares,  divides,  square  roots,  exponentials  and 


-  Target  Height 
Mcaau  resents 


Ttae  (see) 


resents  and 


Estimated  Target 


Heigh 


—  — Actual  Target  Velocity 


Estimated  Tarjcc  Velocity 


86° 


E*C lasted  LOS  Ancle 


Figure  4-4  Es:i.-.:e_  Line-oi -Sight  (LCS)  Angle  From  EKF 


Figure  4-5  Estimated  target  Range  From  EKF 
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trigonometric  functions  are  needed.  In  addition,  the  extended  Kalman 
filter  requires  the  solution  of  nonlinear  ordinary  differential  equations. 
Thus,  high-speed  nonlinear  function  evaluation  is  required  to  solve  target 
tracking  problems  on  a  timely  basis.  This  test  case,  although  simple,  can 
be  solved  relatively  easily  to  provide  a  known  solution  to  verify  the 
parallel  Kalman  filter  algorithms  and  architectures.  This  test  case  can 
be  expanded  to  three  dimensions  using  angle-only  measurements  of  the 
target.  In  this  case,  the  target  tracking  problem  becomes  more  nonlinear 
and  involves  additional  trig  functions  to  be  computed  during  coordinate 
transformations . 
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SECTION  5 


CONCLUSIONS  AND  RECOMMENDATIONS 


5.1  CONCLUSIONS 


Based  on  the  results  of  our  Phase  I  study,  the  following  conclusions  can 
be  drawsn: 

o  It  is  technically  feasible  to  decouple  the  predictor  and  corrector 
equations  in  a  standard  Kalman  filter  for  parallel  processing  on 
multiple  processors. 

o  The  decoupling  principle  allows  the  parallel  Kalman  filter's 
predictor  and  corrector  equations  to  be  computed  on  separate 
processors  improving  computational  speed  directly  proportional 
to  the  number  of  available  processing  elements. 

o  It  is  feasible  to  extend  the  linear  PKF  theory  to  nonlinear  target 
tracking  and  estimation  problems  allowing  an  extended  Kalman  filter 
to  run  on  multiple  parallel  processors. 

Because  both  linear  and  nonlinear  filtering  can  benefit  from  the  decoupling 
principle,  this  research  activity  appears  well  suited  for  transition  to  the 
Phase  II  stage  of  the  SBIR  program. 


5.2  RECOMMENDATIONS 


Based  on  the  conclusions  derived  from  our  Phase  I  results,  the  following 
recommendations  are  presented: 

o  Continue  coding  and  evaluating  the  PKF  algorithms  on  a  parallel 
computer  whose  architecture  can  be  reconfigured  to  validate  newly 
developed  target  tracking  algorithms  and  architectures.  Many 
issues  regarding  the  implementation  of  the  parallel  Kalman  filter 
can  be  learned  by  coding  the  PKF  algorithms  and  architectures. 

For  example,  timing,  synchronization,  drift,  potential  divergence 
of  the  error  covariance  update,  and  model  sensitivities  could  have 
a  major  impact  on  the  ultimate  application  of  the  PKF.  Hence,  it 
is  recommended  that  an  expert  system  be  developed  to  manage  PKF 
computations . 


Create  a  parallel  computing  testbed  facility  (based  on  Industry- 
standard  hardware  and  software).  A  f lexible/reconf igurable 
parallel  computing  testbed  is  recommended  to  rapidly  test  and 
evaluate  the  performance  of  newly  developed  parallel  processing 
algorithms  and  architectures.  Because  WSMR  target  tracking  appli¬ 
cations  tend  to  be  nonlinear,  a  scalable  architecture  (i.e., 
expandable  based  on  problem  size)  for  nonlinear  function  evalua¬ 
tion  is  recommended.  General-purpose  microprocessor/coprocessor 
technology  augmented  by  a  64-bit  floating-point  25  MFLOP  array 
processor  board  set  is  recommended  to  accommodate  a  wide-class 
of  parallel  algorithms.  Four  processors  per  card  are  recommended 
to  simultaneously  compute  the  equations  in  the  decoupled  PKF 
(i.e.,  two  processors  for  the  predictor  and  two  processors  for 
the  corrector  per  card) .  Multiple  cards  (say  twelve)  can  be 
installed  in  the  testbed  to  provide  25  MFLOPs  of  64-bit  nonlinear 
function  evaluation  power.  An  industry-standard  Vme  bus  is  also 
recommended  for  several  reasons:  (1)  Vme  is  a  high-performance 
bus,  (2)  Vme  is  supported  by  several  major  companies  allowing  the 
government  to  add  "special  function"  cards  to  the  system,  and  (3) 
Vme  is  also  standard  in  high-rel,  milspec  and  ruggedized  systems 
for  actual  field  test  of  our  PKF  technology. 

Use  actual  flight  test  data  to  show  the  benefits  of  the  PKF 


technology .  Because  of  the  complexity  of  realistic  target 
tracking  applications,  it  is  anticipated  that  even  today's  super¬ 
computer  architectures  will  not  be  capable  of  solving  these 
problems  in  real  time.  Due  to  the  unique  matching  of  the  PKF 
algorithms  and  architectures,  it  is  anticipated  that  problems 
that  could  not  be  s  lved  otherwise  in  a  reasonable  time  (at  a 
reasonable  cost)  can  be  solved  on  the  proposed  testbed.  Thus, 
it  is  recommended  that  a  target  tracking  problem  based  on  actual 
flight  test  data  be  solved  and  benchmark  performance  documented 
so  that  future  designs  can  be  compared.  Due  to  the  "special" 
architecture  of  the  recommended  parallel  computing  system,  it 
is  anticipated  that  it  can  be  the  standard  to  improve  upon  for 
the  next  five  years. 


5.3  SUMMARY 

Based  on  the  results  in  this  report,  it  is  clear  that  the  PKF  theory  is 
well  developed,  mature  and  ready  to  proceed  to  full-scale  validation  on  actual 
flight  test  data  on  a  parallel  processing  testbed.  Because  the  PKF  technology 
has  been  needed  to  solve  several  applications  in  the  DoD  for  more  than  a 
decade,  it  is  anticipated  that  once  fully  developed  this  technology  can 
benefit  several  sectors  of  the  DoD.  This  is  possible  because  the  necessary 
technology  (e.g.,  25  MFLOP  64-bit  floating-point  adders/multipliers/dividers 
and  25  MHz  68030/68882  general-purpose  microprocessors)  has  only  recently  been 
available  to  transition  the  PKF  theory  into  practice.  Hence,  Systolic  Systems 
would  be  pleased  to  continue  this  program  under  Phase  II  of  the  SBIR  program. 
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